How to Fix Claude Code Rate Limit Errors (6 Solutions)
TL;DR Claude Code rate limits run on a rolling 5-hour window — not a hard monthly cap. Switching to a lighter model, compacting context, or re-logging in after a plan…
- Claude Code rate limits run on a rolling 5-hour window — not a hard monthly cap.
- Switching to a lighter model, compacting context, or re-logging in after a plan upgrade fixes most cases.
- API key billing removes session caps entirely — you pay per token with no 5-hour window.
Overview
You’re mid-session in Claude Code, the terminal responds with “You’ve hit your limit” or “Rate limit exceeded,” and everything stops. This hits developers on every plan tier — Pro, Team, and Max included. The error has multiple root causes, from bloated context windows to stale credential caches. This guide covers 6 tested fixes, ordered from quickest to most permanent, plus the actual mechanics behind how Claude Code metering works.
What Causes This Error
Claude Code enforces a rolling 5-hour usage window per account. Each request consumes tokens (input + output), and when your cumulative usage exceeds your tier’s allocation, the CLI blocks further requests until the window rolls forward.
The exact error varies by context:
You've reached your usage limit for this period.
Rate limit exceeded. Please wait before making more requests.
5-hour limit reached - resets [time].
If you’re using an API key instead of a subscription, you’ll see:
API Error: Rate limit reached (HTTP 429)
Three distinct mechanisms trigger these errors:
- Session usage limits — the 5-hour rolling window consumed your tier’s token allocation.
- API rate limits — requests-per-minute or tokens-per-minute caps on API key accounts.
- Context window bloat — conversations that grow too large consume disproportionate tokens per request, draining your quota faster than expected.
Solution 1: Switch to a Lighter Model
The fastest fix. Opus consumes significantly more of your token budget per interaction than Sonnet due to longer outputs and higher per-token cost weighting.
/model sonnet
Run this directly in your Claude Code session. You can switch back to Opus once your limit resets. For routine tasks — file edits, grep-heavy exploration, test runs — Sonnet handles the work without meaningful quality loss.
/model without arguments to see all available models and your current selection.You can also set a default model in your configuration to avoid hitting Opus limits during exploratory work:
// ~/.claude/settings.json
{
"model": "sonnet"
}
Solution 2: Compact or Clear Your Context
Large conversation contexts are the hidden budget killer. Every message you send includes the full conversation history as input tokens. A 50-message session can easily push 100K+ input tokens per request — that’s your limit draining with each follow-up.
/compact
This summarizes your conversation history into a compressed form, dramatically reducing per-request token consumption. If your session is beyond saving:
/clear
This wipes the conversation entirely and starts fresh. Your files and working directory remain untouched.
When to use which:
/compact— mid-task, you need continuity but the context has grown unwieldy./clear— starting a new task anyway, or context has become incoherent after many edits.
Running /compact every 20-30 messages is cheap insurance against premature limit exhaustion.
Solution 3: Re-Login After Plan Upgrades
If you upgraded from Pro to Max (or any tier change), Claude Code may still enforce your previous plan’s limits until you refresh credentials.
claude logout
rm -rf ~/.config/claude
claude login
This forces the CLI to fetch your current plan tier from Anthropic’s servers. Multiple GitHub issues confirm this resolves “phantom” rate limits that persist after upgrades.
~/.config/claude removes all cached session data. Your project files and CLAUDE.md are unaffected.Solution 4: Wait for the Window Reset
Sometimes the answer is patience. The 5-hour window is rolling — you don’t need to wait the full 5 hours if your heaviest usage was early in the window.
Check your current status:
/usage
This shows your remaining allocation and when the window resets. If you’re close to the reset, a 30-60 minute break may be enough.
Plan tier allocations scale with price. If Pro’s 5-hour window isn’t enough for your workload, upgrading to Max 5x ($100/mo) or Max 20x ($200/mo) multiplies your ceiling accordingly.
Solution 5: Switch to API Key Billing
If you hit limits regularly, API key billing removes the 5-hour window entirely. You pay per token — no session caps, no rolling windows.
export ANTHROPIC_API_KEY="sk-ant-..."
claude
Generate a key at console.anthropic.com → API Keys → Create Key, set the environment variable in your shell profile, and launch Claude Code. The CLI auto-detects the API key and switches to pay-per-token billing.
With API billing, you’re subject to requests-per-minute and tokens-per-minute rate limits instead of usage caps. These are much higher ceilings — you’ll hit them only under extreme parallel usage.
Solution 6: Update Claude Code
Version-specific bugs can cause abnormal rate limit behavior. Prompt caching regressions are the most prominent example — users on affected versions have seen their entire 5-hour budget consumed in minutes.
npm update -g @anthropic-ai/claude-code
Or if you installed via Homebrew:
brew upgrade claude-code
After updating, verify:
claude --version
Check the Claude Code changelog for rate-limit-related fixes if you’re on an older version.
Understanding the Rate Limit Tiers
Each plan tier determines your token allocation within the 5-hour rolling window. API key billing is a separate path available to anyone with an Anthropic API account, independent of subscription plan:
| Plan | Price | 5-Hour Window | Limit Type |
|---|---|---|---|
| Pro | $20/mo | Base allocation | Session cap |
| Team | $30/mo per seat | Higher than Pro | Session cap |
| Max 5x | $100/mo | 5x Pro | Session cap |
| Max 20x | $200/mo | 20x Pro | Session cap |
| API Key (separate) | Pay-per-token | No window cap | Per-minute limits |
FAQ
How long do Claude Code rate limits last?
Claude Code uses a rolling 5-hour window. Your usage gradually expires as the window moves forward, so you don’t always need to wait the full 5 hours. If your heaviest usage was early in the window, you may regain capacity within 30-60 minutes. Run /usage to check your current status and reset timing.
Does switching models reset my rate limit?
No, switching models does not reset your rate limit. Your cumulative token usage within the 5-hour window remains the same. However, switching to a lighter model like Sonnet means each subsequent request consumes fewer tokens, so your remaining allocation lasts longer and you can continue working while the window rolls forward.
What’s the difference between session limits and API rate limits?
Session limits apply to subscription plans (Pro, Team, Max) and cap your total token usage within a rolling 5-hour window. Once you hit the cap, you’re blocked until the window rolls forward. API rate limits apply when using an API key and restrict requests-per-minute or tokens-per-minute — they’re much higher ceilings designed to prevent abuse rather than meter usage, and they reset every minute rather than every 5 hours.
Can I use Claude Code offline to avoid rate limits?
No. Claude Code requires a live connection to Anthropic’s servers for every request — all model inference happens server-side. There is no offline mode, local model support, or way to bypass the connection requirement. Rate limits are enforced server-side regardless of how you connect.
Do rate limits apply to Claude Code in VS Code and JetBrains extensions?
Yes. The VS Code and JetBrains extensions use the same underlying Claude Code engine and the same account credentials. Your rate limit is tied to your Anthropic account, not to which interface you use. Usage from the terminal CLI, VS Code extension, and JetBrains extension all count against the same 5-hour rolling window.
Does /compact actually reduce my rate limit usage?
Yes. Every message you send includes your full conversation history as input tokens, so longer contexts drain your limit faster. Running /compact summarizes the history into a compressed form, which means each subsequent request sends fewer input tokens. It doesn’t recover tokens already spent, but it slows down how quickly you consume your remaining allocation.
Still Not Working?
If none of the above resolves your issue:
- Check Anthropic’s status page — outages can manifest as rate limit errors.
- File a GitHub issue at anthropics/claude-code with your plan tier, Claude Code version, and the exact error message.
- Review existing issues — search for “rate limit” in the repo. Multiple issues track ongoing edge cases where limits trigger incorrectly.
- Contact Anthropic support through the Help Center if you believe your account is incorrectly throttled.