General

I was Burning Through Claude Code Limits in Hours.

Yash Khivasara

06 Apr 2026 — 7 min read

Last week, I sat down to build an efficiency guide for my Claude Code usage. The irony? The process of creating that guide became the perfect case study for why we needed it in the first place. Here's what happened, what I learned, and the full playbook that came out of it.

The Problem Nobody Talks About

Since March 2026, Anthropic has quietly tightened the screws. Peak-hour throttling went live, the generous 2x off-peak promotion ended, and developers started reporting quota exhaustion in as little as 19 minutes — on a window that's supposed to last five hours.

Anthropic eventually acknowledged it publicly: "People are hitting usage limits in Claude Code way faster than expected."

I wasn't immune to this attrocity. I was getting locked out before lunch. Imagine buring through Max 5x allocation in a single hour of focused work.

So I set out to build documentation the people actually follow. A single guide — no fluff, no theory — that would keep people productive without draining the their quota or wallet.

What I didn't expect was how much the process of building it would teach me about the problem.

The Session That Became the Case Study

The task seemed straightforward: go through a reference article on vishnuai.in/stop-rate-limits, cross-reference it with community discussions on Twitter and Reddit, pull in Anthropic's official docs, and compile everything into one clean markdown file.

Simple enough. Except here's how it actually played out.

The blocked URL problem

The first thing I tried was fetching that vishnuai.in article directly. Blocked. Network proxy wouldn't let it through. So instead of stalling, I pivoted and launched parallel web searches across five different angles simultaneously: official docs, news coverage of the March 2026 crisis, community tips, context management techniques, and billing concerns.

That parallel research pulled in content from Anthropic's official cost management page, The Register's coverage of the quota crisis, SFEIR Institute's context optimization guide, SitePoint, DevClass, and several developer blogs.

A first draft of the guide came together from all that. It was solid something that covered context management, model selection, CLAUDE.md best practices, subagents, hooks, the works.

But there was a problem.

"You did a good reference to the website, right?"

That one question stopped everything. The original article, the one that started the whole task wasn't actually in the document. Not directly. I'd pulled from the same sources it likely referenced, sure. But the article itself had a specific framework: 10 numbered tips, a token cost formula, real math showing why message #30 costs 31x more than message #1. None of that specificity was in my draft.

This is exactly the anti-pattern the guide itself warns about: vague inputs produce vague outputs. I'd done broad research when the ask was pointed. The article was the backbone and everything else was supposed to enrich it, not replace it.

The fix: use the right tool

So I opened the article through Chrome directly, extracted every tip, every number, every formula. Then rewrote the entire guide using the article's 10-rule structure as the skeleton, with all the earlier research layered in as Claude Code-specific expansions under each rule.

The result was dramatically better. Not because the information changed but because the structure came from a source that had already figured out the right way to teach this stuff.

And here's the meta lesson: I didn't throw away the first round of research. None of that work was wasted. The official Anthropic docs gave me the team rate-limit tables, the $6/day cost benchmark, the hook configuration examples, the subagent patterns. The news coverage gave me the "19 minutes instead of 5 hours" crisis framing. The community sources gave me the "98.5% of tokens go to re-reading history" stat.

All of it found a home — just under a better structure.

The Core Insight That Changes Everything

Here's the single most important thing from the vishnuai.in article, and the line that should be printed on every developer's monitor:

Claude doesn't count messages. It counts tokens.

Every time you send a message, Claude re-reads the entire conversation history. The cost isn't linear — it's quadratic. The article lays out the math cleanly:

Total tokens ≈ S × N(N+1) / 2

Where S is the average tokens per exchange and N is the message count. At roughly 500 tokens per exchange:

5 messages: 7,500 tokens
10 messages: 27,500 tokens
20 messages: 105,000 tokens
30 messages: 232,500 tokens

Message 30 costs 31 times more than message 1. That's not a rounding error. That's why a developer can feel like they're "barely using Claude" and still hit their limit — because the last few messages in a long conversation are astronomically expensive compared to the first few.

Once you internalize this, every other efficiency tip is just a consequence of it.

The 10 Rules (Condensed)

The full guide goes deep on each of these with Claude Code-specific commands and configurations. Here's the short version:

1. Edit your prompt — don't send a follow-up. When Claude gets it wrong, click Edit on the original message and regenerate. Don't stack corrections onto the history. In Claude Code, use /rewind or double-tap Esc.

2. Start fresh every 15-20 messages. One developer found 98.5% of their tokens were spent re-reading history. Use /compact to carry context forward efficiently, or /clear to nuke it entirely.

3. Batch questions into one message. Three separate prompts = three context loads. One prompt with three tasks = one context load. This always produces better answers too, because Claude sees the full picture.

4. Cache recurring files in Projects. Uploading the same document to multiple chats re-tokenizes it every time. In Claude Code, this translates to a well-maintained CLAUDE.md (under 200 lines), a .claudeignore that excludes build artifacts, and Skills for specialized workflows.

5. Set up persistent context. Stop wasting the first 3-5 messages of every session on setup. CLAUDE.md + /init solves this permanently.

6. Turn off features you're not using. MCP servers, extended thinking, web search — each adds tokens even when you don't need them. /mcp to audit, /effort to dial down thinking, MAX_THINKING_TOKENS=8000 for simple tasks.

7. Use Haiku for simple tasks. Grammar, formatting, renaming, brainstorming — Haiku handles all of it at a third of Sonnet's cost. Frees up 50-70% of your budget for work that actually needs power. Switch with /model.

8. Spread work across the day. The limit is a rolling 5-hour window, not a daily reset. Messages from 9 AM stop counting by 2 PM. One intense morning session wastes most of your capacity.

9. Avoid peak hours for heavy work. Weekdays 5-11 AM Pacific, your limit burns faster. Do planning and exploration during peak. Save implementation and debugging for off-peak.

10. Enable Overage as a safety net. Pro and Max subscribers can turn on pay-as-you-go overflow in Settings → Usage. Set a monthly cap. This won't save tokens, but it will save your work when you're deep in a complex session and the limit hits.

The Claude Code-Specific Multipliers

The 10 rules above apply to any Claude usage. These next three are Claude Code-specific and stack on top:

Plan Mode halves your token cost

Press Shift+Tab before complex tasks. Claude reads files and proposes an approach without making changes — using a lighter model. You review the plan, correct the direction if needed, then let it execute. This prevents the most expensive mistake in agentic coding: implementing the wrong thing and having to redo it.

Subagents protect your context window

When you ask Claude to "investigate how our auth system works," it reads dozens of files — all of which land in your context and stay there. Subagents run in their own context window. They explore, read, and report back a summary. Your main session stays clean.

This was one of the biggest takeaways from Anthropic's official best practices: context is your most precious resource, and subagents are the best way to protect it.

Hooks automate what discipline can't

A PreToolUse hook can intercept test commands and filter output to show only failures. Instead of 10,000 lines of test output eating your context (tens of thousands of tokens), you get a few dozen lines of what actually matters. Unlike CLAUDE.md instructions, hooks are deterministic — they run every time, no exceptions.

What the Community Is Saying

The frustration across Twitter, Discord, and developer forums is real. Developers report burning through Max allocations in under an hour. Anthropic has confirmed counter-desync bugs where usage meters don't match reality. The March 2026 changes hit harder than expected because three things compounded at once: peak-hour throttling, the end of the off-peak promotion, and the metering bugs.

But the developers who aren't complaining? They're the ones who learned token economics. They batch prompts, they clear between tasks, they use Haiku for the small stuff, and they plan before they code. The reports from developers applying these techniques consistently show 40-70% cost reduction — sometimes enough to drop down a subscription tier entirely.

The Meta Lesson

Building this guide took a research session that hit multiple sources, dealt with blocked URLs, pivoted tools mid-task, and required a full rewrite when the first structure wasn't right.

Every single one of those moments maps to a tip in the guide itself:

The parallel research launch → Rule 3: batch your work into fewer, larger operations instead of sequential small ones
The rewrite after the first draft → Rule 1: fix the prompt, don't pile corrections on top
Using Chrome when the direct fetch failed → Rule 6: use the right tool for the job, don't burn tokens fighting with the wrong one
Merging all research under the article's structure → Rule 4: don't re-process the same information from scratch — cache it, reference it, build on it

The guide teaches token efficiency. The process of building it taught the same thing through experience.

Where to Start

If you are feeling the squeeze from the March 2026 changes, start with three things today:

Go through the token math. The quadratic cost formula is the "aha" moment. Once people see that message 30 costs 31x more than message 1, behavior changes fast.
Set up .claudeignore in every project. Five minutes of configuration saves 30,000-100,000 tokens per session. Best ROI you'll find.
Make /clear a reflex. Between tasks, between topics, between debugging and implementation. This single habit probably accounts for more token savings than everything else combined.

The full guide with command references, hook configurations, subagent templates, and team rate-limit tables is available internally. It's a living document — we update it as Anthropic adjusts their limits and as we learn what works.

The blog was built using content from Vishnu AI's Stop Rate Limits, Anthropic's official cost management docs, Claude Code best practices, and community reporting from The Register, DevClass, and SFEIR Institute.