GUIDES Apr 22, 2026 10 min read

Windsurf Cascade Mode: Complete Guide to the Agentic AI Assistant

TL;DR Cascade is Windsurf’s agentic AI that plans, edits multiple files, and runs terminal commands in a single flow. Three modes — Write, Chat, and Plan — each serve distinct…

by Bugi 10 min

TL;DR

Cascade is Windsurf’s agentic AI that plans, edits multiple files, and runs terminal commands in a single flow.
Three modes — Write, Chat, and Plan — each serve distinct purposes; toggle with Ctrl + ..
Rules (.windsurfrules) and Memories give Cascade persistent, project-aware context across sessions.

Overview

Cascade is the agentic AI assistant built into Windsurf, Codeium’s AI code editor. Unlike line-level autocomplete or single-file chat, Cascade operates at the project level: it indexes your entire codebase, builds a multi-step plan, then executes coordinated edits across multiple files while running terminal commands as needed.

This guide covers how to activate Cascade, use its three modes effectively, configure rules and memories for persistent context, and avoid common pitfalls. It assumes you have Windsurf installed and a project open.

Windsurf · quick reference

Vendor: Codeium
Pricing: Free · Pro $15/mo · Team $35/mo
Platforms: macOS, Linux, Windows
Flagship models: SWE-1.6, SWE-1.5
Config file: .windsurfrules

Prerequisites

Before using Cascade, confirm the following:

Windsurf installed on macOS, Linux, or Windows. Download from the official site.
A Windsurf account — free tier includes 25 credits. Pro ($15/mo) and Team ($35/mo) plans increase limits.
A project open in the editor. Cascade indexes your codebase on open; a project with files gives you something to work with immediately.
Optional: API keys for BYOK (Bring Your Own Key) if you want access to models beyond the defaults, such as Claude 4 Opus or GPT-5.1-Codex Max.

No additional CLI tools or plugins are required. Cascade ships as a core feature of the editor.

Opening Cascade

Open the Cascade panel with a keyboard shortcut or the UI icon.

Open the panel

Press Cmd + L (Mac) or Ctrl + L (Windows/Linux). Alternatively, click the Cascade icon in the top-right corner.

Select your mode

Toggle between Write, Chat, and Plan modes with Cmd + . or Ctrl + ..

Add context

Select code in the editor before opening Cascade — it’s automatically included. Use @-mentions to reference specific files, folders, or docs.

Any text selected in the editor or terminal is passed to Cascade as context when the panel opens. This is faster than manually pasting snippets into the prompt.

Understanding the Three Modes

Cascade has three distinct modes, each designed for a different type of interaction. Choosing the right mode matters — it determines whether Cascade can modify your code or only discuss it.

Write Mode

Write mode is Cascade’s primary agentic mode. It can:

Create, modify, and delete code across multiple files
Execute terminal commands and read their output
Show diffs before applying changes
Roll back to any checkpoint if something goes wrong

When you prompt Cascade in Write mode, it generates a step-by-step plan, then executes it. You see diffs for each file change and can accept or reject them individually.

Chat Mode

Chat mode is read-only. Cascade answers questions about your codebase, explains code, discusses architecture, and helps debug — but makes no modifications. Use this when you need to understand before you act.

Plan Mode

Plan mode creates detailed implementation plans without writing code. It’s useful for scoping complex features before committing to implementation. Think of it as a design doc generator that understands your actual codebase.

Tip

Start complex tasks in Plan mode to get the architecture right, then switch to Write mode to execute. This prevents Cascade from charging ahead with a suboptimal approach.

How Cascade Understands Your Codebase

Cascade’s context engine is what separates it from a generic LLM chat wrapper. When you open a project, Windsurf immediately indexes every file — not just the ones currently open.

Each file and function is converted into 768-dimensional vector embeddings that capture semantic meaning. When you prompt Cascade, it runs a similarity search (using Codeium’s proprietary M-Query retrieval) against this index and pulls the most relevant code into the prompt context.

The context system operates on multiple layers simultaneously:

RAG-based codebase index — semantic search across all project files
Real-time action tracking — your edits, terminal commands, clipboard contents
Memories — auto-generated persistent context from prior sessions
Rules — user-defined instructions from .windsurfrules and AGENTS.md

A visual context window indicator in the UI shows how much of the available context is consumed. When it’s near capacity, start a new session to avoid silent truncation.

Warning

If your AGENTS.md, Memories, and active Rules exceed the context limit, content is silently truncated. Keep rules concise — bullet points over paragraphs.

Configuring Rules for Better Output

Rules are the single highest-impact configuration for Cascade’s output quality. They tell Cascade how your project works, what conventions to follow, and what patterns to avoid.

Project-level rules

Create a .windsurfrules file in your project root:

# Project Rules

## Stack
- Framework: Next.js 15 with App Router
- Language: TypeScript (strict mode)
- Styling: Tailwind CSS v4
- Database: PostgreSQL via Drizzle ORM

## Conventions
- Use named exports, not default exports
- Error boundaries on every route segment
- No barrel files (index.ts re-exports)

## Anti-patterns
- Never use `any` type
- No inline styles
- Do not install new dependencies without asking

Workspace rules

For more granular control, create files in .windsurf/rules/. Each file scopes to a specific domain — auth.md, testing.md, api.md — so Cascade pulls only what’s relevant.

Global rules

Create a global_rules.md for preferences that apply across all your projects. These cover personal style: indentation, commit message format, preferred libraries.

Takeaway

Well-structured rules produce dramatically better output. Invest 10 minutes writing them and every subsequent Cascade interaction improves.

Using Memories for Persistent Context

Cascade automatically generates Memories during your interactions. These persist across sessions and capture patterns like frequently referenced functions, preferred libraries, and project-specific idioms.

Key details:

Auto-generated and local-only — Memories live on your machine, not synced to your team.
Persistent across sessions — Cascade remembers what it learned last time you worked on this project.
Can become stale — After major refactors, Memories may reference outdated patterns. Clear them periodically.

For knowledge you want the entire team to share, write it as a Rule (version-controlled) or add it to AGENTS.md rather than relying on auto-generated Memories.

Terminal Integration and Turbo Mode

Cascade can execute terminal commands directly. By default, it asks for approval before running each command. Turbo Mode changes this — commands auto-execute unless they’re on your deny list.

Windsurf uses a dedicated zsh shell profile for agent execution, separate from your regular terminal. This improves reliability and prevents Cascade from interfering with your active terminal sessions.

Configure allow/deny lists to control what auto-executes:

Allow list: Commands that always run without prompting (e.g., npm test, git status)
Deny list: Commands that always require manual approval (e.g., rm -rf, git push --force)

Danger

Turbo Mode with a permissive allow list can execute destructive commands without confirmation. Always add destructive operations to the deny list before enabling it.

Choosing a Model

Cascade supports multiple underlying models. Your choice affects speed, quality, and credit consumption.

Model	Speed	Best for	Notes
SWE-1.6	Fast	General agentic coding	Codeium’s latest frontier model
SWE-1.5	Fast	Free-tier users	Free version available
Claude Sonnet 4.6	Medium	Complex reasoning	Promotional pricing
Gemini 3 Flash	Fast	Quick iterations	Available to all users
GPT-5.1-Codex Max	Varies	Heavy reasoning tasks	Low/Medium/High reasoning tiers

Arena Mode lets you run two models side-by-side with hidden identities, so you can blind-test which performs best for your specific workflow. Useful for deciding between SWE-1.6 and a third-party model on your actual codebase.

Tips and Best Practices

Start sessions clean. When the context window indicator is above 80%, open a new Cascade session. Stuffed context degrades output quality.
Use @-mentions liberally. Instead of hoping Cascade finds the right file via indexing, explicitly reference files with @filename. Precision beats recall.
Write rules as bullet points. Long prose in .windsurfrules confuses the model. Numbered lists and short declarative statements work best.
Use checkpoints as undo. Cascade creates visual checkpoints at each step. If a multi-file edit goes wrong, roll back to a specific checkpoint instead of manually reverting.
Parallel sessions for independent tasks. Since Wave 13, you can run multiple Cascade sessions simultaneously using Git worktrees. Use this for tasks that don’t touch the same files.
Clear stale Memories after refactors. If you renamed a module or restructured directories, old Memories may cause Cascade to reference paths that no longer exist.

FAQ

What is the difference between Cascade Write mode and Chat mode?

Write mode can create, modify, and delete files and run terminal commands. Chat mode is read-only — it answers questions and explains code but makes no changes. Toggle between them with Ctrl + ..

How does Cascade differ from Windsurf Tab (Supercomplete)?

Windsurf Tab provides inline autocomplete suggestions at your cursor position, powered by SWE-1-mini for low latency. Cascade is a full agentic assistant that plans and executes multi-file operations. They serve different purposes — Tab for flow-state typing, Cascade for task-level work.

Does Cascade send my code to external servers?

Codebase indexing generates vector embeddings locally — raw source code is not transmitted for indexing. However, when you prompt Cascade, relevant code snippets are sent to the model provider (Codeium, Anthropic, OpenAI, or Google depending on your selected model) as part of the prompt context. Enterprise plans offer additional data residency controls.

How many credits does Cascade use on the free tier?

The free tier includes 25 credits. Credit consumption varies by model and task complexity. Active daily use typically exhausts free credits in about 3 days. The Pro plan at $15/mo provides significantly more credits.

Can I use my own API keys with Cascade?

Yes. Windsurf supports BYOK (Bring Your Own Key) for models like Claude 4 Opus, Claude 4 Sonnet, and GPT-5.1-Codex Max. Configure your API keys in Windsurf’s settings to access these models directly through your own accounts.

What should I put in .windsurfrules?

Include your tech stack, coding conventions, anti-patterns to avoid, and project-specific architecture decisions. Use bullet points and short declarative statements — avoid long prose. This is the single most impactful configuration for improving Cascade’s output quality.

Why does Cascade sometimes reference files or functions that no longer exist?

Auto-generated Memories can become stale after refactors. Cascade may reference outdated paths or function names stored from prior sessions. Clear your Memories after major restructuring to force Cascade to re-learn from the current codebase state.

What is the difference between Cascade Write mode and Chat mode?

How does Cascade differ from Windsurf Tab (Supercomplete)?

Does Cascade send my code to external servers?

Codebase indexing generates vector embeddings locally — raw source code is not transmitted for indexing. However, when you prompt Cascade, relevant code snippets are sent to the model provider as part of the prompt context. Enterprise plans offer additional data residency controls.

How many credits does Cascade use on the free tier?

Can I use my own API keys with Cascade?

Yes. Windsurf supports BYOK (Bring Your Own Key) for models like Claude 4 Opus, Claude 4 Sonnet, and GPT-5.1-Codex Max. Configure your API keys in Windsurf settings to access these models directly through your own accounts.

What should I put in .windsurfrules?

Include your tech stack, coding conventions, anti-patterns to avoid, and project-specific architecture decisions. Use bullet points and short declarative statements. This is the single most impactful configuration for improving Cascade output quality.

Why does Cascade sometimes reference files or functions that no longer exist?