FIX May 12, 2026 9 min read

How to Fix Claude Code Rate Limit Errors (6 Solutions)

TL;DR Claude Code rate limits run on a rolling 5-hour window — not a hard monthly cap. Switching to a lighter model, compacting context, or re-logging in after a plan…

by Bugi 9 min

TL;DR

Claude Code rate limits run on a rolling 5-hour window — not a hard monthly cap.
Switching to a lighter model, compacting context, or re-logging in after a plan upgrade fixes most cases.
API key billing removes session caps entirely — you pay per token with no 5-hour window.

Overview

You’re mid-session in Claude Code, the terminal responds with “You’ve hit your limit” or “Rate limit exceeded,” and everything stops. This hits developers on every plan tier — Pro, Team, and Max included. The error has multiple root causes, from bloated context windows to stale credential caches. This guide covers 6 tested fixes, ordered from quickest to most permanent, plus the actual mechanics behind how Claude Code metering works.

What Causes This Error

Claude Code enforces a rolling 5-hour usage window per account. Each request consumes tokens (input + output), and when your cumulative usage exceeds your tier’s allocation, the CLI blocks further requests until the window rolls forward.

The exact error varies by context:

You've reached your usage limit for this period.

Rate limit exceeded. Please wait before making more requests.

5-hour limit reached - resets [time].

If you’re using an API key instead of a subscription, you’ll see:

API Error: Rate limit reached (HTTP 429)

Three distinct mechanisms trigger these errors:

Session usage limits — the 5-hour rolling window consumed your tier’s token allocation.
API rate limits — requests-per-minute or tokens-per-minute caps on API key accounts.
Context window bloat — conversations that grow too large consume disproportionate tokens per request, draining your quota faster than expected.

Danger

Prompt cache regressions have occasionally caused limits to drain in minutes instead of hours. If you’re experiencing unusually fast limit exhaustion, update Claude Code to the latest version — these bugs are typically patched quickly.

Solution 1: Switch to a Lighter Model

The fastest fix. Opus consumes significantly more of your token budget per interaction than Sonnet due to longer outputs and higher per-token cost weighting.

/model sonnet

Run this directly in your Claude Code session. You can switch back to Opus once your limit resets. For routine tasks — file edits, grep-heavy exploration, test runs — Sonnet handles the work without meaningful quality loss.

Tip

Use /model without arguments to see all available models and your current selection.

You can also set a default model in your configuration to avoid hitting Opus limits during exploratory work:

// ~/.claude/settings.json
{
  "model": "sonnet"
}

Solution 2: Compact or Clear Your Context

Large conversation contexts are the hidden budget killer. Every message you send includes the full conversation history as input tokens. A 50-message session can easily push 100K+ input tokens per request — that’s your limit draining with each follow-up.

/compact

This summarizes your conversation history into a compressed form, dramatically reducing per-request token consumption. If your session is beyond saving:

/clear

This wipes the conversation entirely and starts fresh. Your files and working directory remain untouched.

When to use which:

/compact — mid-task, you need continuity but the context has grown unwieldy.
/clear — starting a new task anyway, or context has become incoherent after many edits.

Takeaway

Running /compact every 20-30 messages is cheap insurance against premature limit exhaustion.

If you upgraded from Pro to Max (or any tier change), Claude Code may still enforce your previous plan’s limits until you refresh credentials.

claude logout
rm -rf ~/.config/claude
claude login

This forces the CLI to fetch your current plan tier from Anthropic’s servers. Multiple GitHub issues confirm this resolves “phantom” rate limits that persist after upgrades.

Warning

Deleting ~/.config/claude removes all cached session data. Your project files and CLAUDE.md are unaffected.

Solution 4: Wait for the Window Reset

Sometimes the answer is patience. The 5-hour window is rolling — you don’t need to wait the full 5 hours if your heaviest usage was early in the window.

Check your current status:

/usage

This shows your remaining allocation and when the window resets. If you’re close to the reset, a 30-60 minute break may be enough.

Plan tier allocations scale with price. If Pro’s 5-hour window isn’t enough for your workload, upgrading to Max 5x ($100/mo) or Max 20x ($200/mo) multiplies your ceiling accordingly.

Solution 5: Switch to API Key Billing

If you hit limits regularly, API key billing removes the 5-hour window entirely. You pay per token — no session caps, no rolling windows.

export ANTHROPIC_API_KEY="sk-ant-..."
claude

Generate a key at console.anthropic.com → API Keys → Create Key, set the environment variable in your shell profile, and launch Claude Code. The CLI auto-detects the API key and switches to pay-per-token billing.

With API billing, you’re subject to requests-per-minute and tokens-per-minute rate limits instead of usage caps. These are much higher ceilings — you’ll hit them only under extreme parallel usage.

Tip

Set a spend limit in the Anthropic console to avoid unexpected bills. API billing has no monthly cap by default.

Solution 6: Update Claude Code

Version-specific bugs can cause abnormal rate limit behavior. Prompt caching regressions are the most prominent example — users on affected versions have seen their entire 5-hour budget consumed in minutes.

npm update -g @anthropic-ai/claude-code

Or if you installed via Homebrew:

brew upgrade claude-code

After updating, verify:

claude --version

Check the Claude Code changelog for rate-limit-related fixes if you’re on an older version.

Understanding the Rate Limit Tiers

Each plan tier determines your token allocation within the 5-hour rolling window. API key billing is a separate path available to anyone with an Anthropic API account, independent of subscription plan:

Plan	Price	5-Hour Window	Limit Type
Pro	$20/mo	Base allocation	Session cap
Team	$30/mo per seat	Higher than Pro	Session cap
Max 5x	$100/mo	5x Pro	Session cap
Max 20x	$200/mo	20x Pro	Session cap
API Key (separate)	Pay-per-token	No window cap	Per-minute limits

FAQ

How long do Claude Code rate limits last?

Claude Code uses a rolling 5-hour window. Your usage gradually expires as the window moves forward, so you don’t always need to wait the full 5 hours. If your heaviest usage was early in the window, you may regain capacity within 30-60 minutes. Run /usage to check your current status and reset timing.

Does switching models reset my rate limit?

No, switching models does not reset your rate limit. Your cumulative token usage within the 5-hour window remains the same. However, switching to a lighter model like Sonnet means each subsequent request consumes fewer tokens, so your remaining allocation lasts longer and you can continue working while the window rolls forward.

What’s the difference between session limits and API rate limits?

Session limits apply to subscription plans (Pro, Team, Max) and cap your total token usage within a rolling 5-hour window. Once you hit the cap, you’re blocked until the window rolls forward. API rate limits apply when using an API key and restrict requests-per-minute or tokens-per-minute — they’re much higher ceilings designed to prevent abuse rather than meter usage, and they reset every minute rather than every 5 hours.

Can I use Claude Code offline to avoid rate limits?

No. Claude Code requires a live connection to Anthropic’s servers for every request — all model inference happens server-side. There is no offline mode, local model support, or way to bypass the connection requirement. Rate limits are enforced server-side regardless of how you connect.

Do rate limits apply to Claude Code in VS Code and JetBrains extensions?

Yes. The VS Code and JetBrains extensions use the same underlying Claude Code engine and the same account credentials. Your rate limit is tied to your Anthropic account, not to which interface you use. Usage from the terminal CLI, VS Code extension, and JetBrains extension all count against the same 5-hour rolling window.

Does /compact actually reduce my rate limit usage?

Yes. Every message you send includes your full conversation history as input tokens, so longer contexts drain your limit faster. Running /compact summarizes the history into a compressed form, which means each subsequent request sends fewer input tokens. It doesn’t recover tokens already spent, but it slows down how quickly you consume your remaining allocation.

Still Not Working?

If none of the above resolves your issue:

Check Anthropic’s status page — outages can manifest as rate limit errors.
File a GitHub issue at anthropics/claude-code with your plan tier, Claude Code version, and the exact error message.
Review existing issues — search for “rate limit” in the repo. Multiple issues track ongoing edge cases where limits trigger incorrectly.
Contact Anthropic support through the Help Center if you believe your account is incorrectly throttled.

How long do Claude Code rate limits last?

Claude Code uses a rolling 5-hour window. Your usage gradually expires as the window moves forward, so you don’t always need to wait the full 5 hours. If your heaviest usage was early in the window, you may regain capacity within 30-60 minutes. Run /usage to check your current status and reset timing.

Does switching models reset my rate limit?

What’s the difference between session limits and API rate limits?

Can I use Claude Code offline to avoid rate limits?

Do rate limits apply to Claude Code in VS Code and JetBrains extensions?

Does /compact actually reduce my rate limit usage?

Yes. Every message you send includes your full conversation history as input tokens, so longer contexts drain your limit faster. Running /compact summarizes the history into a compressed form, which means each subsequent request sends fewer input tokens. It doesn’t recover tokens already spent, but it slows down how quickly you consume your remaining allocation.

Overview

What Causes This Error

Solution 1: Switch to a Lighter Model

Solution 2: Compact or Clear Your Context

Solution 3: Re-Login After Plan Upgrades

Solution 4: Wait for the Window Reset

Solution 5: Switch to API Key Billing

Solution 6: Update Claude Code

Understanding the Rate Limit Tiers

FAQ

How long do Claude Code rate limits last?

Does switching models reset my rate limit?

What’s the difference between session limits and API rate limits?

Can I use Claude Code offline to avoid rate limits?

Do rate limits apply to Claude Code in VS Code and JetBrains extensions?

Does /compact actually reduce my rate limit usage?

Still Not Working?