What is Prompt Compression?

May 09, 2026

As LLM context windows grow, so do the costs. Prompt compression is a technique used to remove redundant information from a prompt while keeping its core intelligence intact.

Using Selective Deletion

Algorithms like LLMLingua allow you to identify and remove the least important tokens in a long prompt. By analyzing the perplexity of each word, the system can condense a 10,000-token document into 2,000 tokens with minimal impact on the AI's response quality, leading to massive savings on API bills.

Vector-Based Summarization

Another approach is using a small model to summarize the background context before sending it to a larger model. This "pre-processing" step ensures that the flagship LLM only sees the most high-density information, reducing the "noise" and speeding up the final response time.