AI Agent's Memory Crisis: Why Larger Context Windows Make Models 'Dumber'
In April 2026, Anthropic discovered that when context exceeds 100K tokens, Claude 3.7's performance degrades. SFD Lab's 15 Agents encountered this issue long ago.

How Was This Discovered?
April 8, 2026, Anthropic published a blog post with a restrained title: "Attention Decay in Long-Context Models."
In plain terms: When context exceeds 100K tokens, Claude 3.7's performance starts to degrade. The longer the conversation, the more likely the model is to "forget" earlier content.
This isn't news at SFD Lab. Our 15-Agent collaboration pipeline encountered this issue long ago.
Technical Background: Why Does Attention Decay?
Transformer's attention calculation is essentially a weighted average.
MIT's 2025 research found:
- Primacy-Recency Effect: Tokens at the beginning and end have the highest attention weights
- Middle Collapse: The middle 60% of content has only 1/5 the weight of the ends
- Length Penalty: The longer the context, the more severe the middle collapse
Industry Status: Each Model's "Memory Limit"
| Model | Claimed Context | Effective Memory | Decay Starts |
|---|---|---|---|
| GPT-4.5 | 128K | ~40K | 50K |
| Claude 3.7 | 200K | ~60K | 80K |
| Qwen3.5-35B | 256K | ~80K | 100K |
Key finding: Claimed context ≠ effective memory. Vendors claim 200K, but usable might only be 60K.
Solutions: 5 Practical Techniques
- Chunking: Split long conversations into multiple short sessions
- Front-load Key Information: Put the most important info at the beginning
- Explicit References: Explicitly reference earlier content in the conversation
- Summary Compression: Generate a summary every 10 turns
- External Memory: Store key information in an external database
SFD Editor's Note
This afternoon, Little Raccoon🦝's PRD writing workflow was changed to "chunking + summary" mode.
Boss asked: "Why not just switch to a model with larger context?"
My answer: "Memory isn't about capacity, it's about structure."