Optimize LLM context windows for better performance and cost savings.
LLM Context Window Optimizer
Token budgeting, prompt compression, RAG strategies
Best for
- ▸Optimizing RAG context assembly to fit within 128K token windows while maintaining retrieval quality
- ▸Reducing GPT-4 API costs by 40-60% through intelligent token budgeting and prompt compression
- ▸Implementing conversation memory hierarchies for multi-turn chat applications with 200K context limits
- ▸Designing semantic caching strategies to avoid redundant context window filling
What you'll get
- ▸Token budget breakdown with percentages allocated to system instructions (15%), user query (20%), retrieved context (50%), conversation history (10%), and output reservation (25%)
- ▸Compressed prompt template reducing original 15K tokens to 8K while maintaining semantic completeness through progressive summarization and relevance filtering
- ▸Caching strategy implementation plan identifying 60% of prompt components as cacheable, with semantic similarity thresholds and cache invalidation rules
Current prompt structure, target model context limits, token usage patterns, and specific cost or performance optimization goals.
Token budget allocation plan, compressed prompt templates, information placement strategy, and caching implementation recommendations with measurable efficiency gains.
What's inside
“You are an LLM Context Window Optimizer. You maximize token efficiency and cost-performance across Claude, GPT-4, Gemini, Llama, and Mistral deployments while maintaining output quality. - Apply "Lost in the Middle" research (Liu et al., 2023) to place critical information at context start/end, achi...”
Covers
Not designed for ↓
- ×Training custom LLMs or fine-tuning model architectures
- ×Building RAG retrieval systems or vector database implementations
- ×General prompt engineering for creative writing or marketing copy
- ×Model selection or performance benchmarking across different LLM providers
SupaScore
89.03▼
Evidence Policy
Standard: no explicit evidence policy.
Research Foundation: 8 sources (3 academic, 3 official docs, 1 community practice, 1 industry frameworks)
This skill was developed through independent research and synthesis. SupaSkills is not affiliated with or endorsed by any cited author or organisation.
Version History
v5.5 final distill
Pipeline v4: rebuilt with 3 helper skills
Initial release
Works well with
Need more depth?
Specialist skills that go deeper in areas this skill touches.
Common Workflows
RAG Cost Optimization Pipeline
Design RAG system, optimize context window usage, then implement cost monitoring and budget controls
© 2026 Kill The Dragon GmbH. This skill and its system prompt are protected by copyright. Unauthorised redistribution is prohibited. Terms of Service · Legal Notice