Table of Contents
Static screenshots can't capture temporal bugs — interactions, sequences, state transitions. But raw screen recordings are opaque to AI. Video capture for AI coding bridges this gap: Stash's Video Capture lets you record your screen with voice narration using ⌘⌃R, or let Instant Replay silently buffer the last 30–120 seconds. Either way, the output is an AI Capture Report — key frames, interaction timeline, voice transcript, console OCR, and focus tracking — structured context that any AI tool can process.
The Static Screenshot Ceiling
Screenshots capture a single moment. But many bugs only exist in motion — scroll jank, animation timing glitches, multi-step interaction failures, race conditions that appear between clicks. Video capture for AI coding addresses these temporal bugs that no static image can show. The feedback loop breaks when you can see the problem on screen but can't communicate the sequence to your AI assistant.
Why Raw Screen Recordings Fail for AI
Most AI coding tools (Claude Code, Cursor, ChatGPT) cannot process raw video files. Even tools with video input see only isolated frames without understanding the temporal relationship between them. No tool extracts which buttons were clicked, what changed on screen, or what the user was trying to do. The visual evidence exists in the recording but is locked in a format AI can't reason about.
Video Capture — Active Recording with Voice (⌘⌃R)
Press ⌘⌃R to start recording. Narrate what you're doing — your voice becomes first-person annotation baked into the structured report. The recording captures:
- Screen content at dynamic frame rate (2–30 FPS based on activity)
- Voice narration via on-device transcription (no audio leaves your machine)
- Every click, scroll, drag, and keyboard shortcut
- Active app, window title, and browser URL at every moment
- Console errors and warnings via OCR
- “Nothing happened” detection — the strongest bug signal (user clicked, nothing changed)
- Retry detection — the same action attempted multiple times
Recordings can be up to 20 minutes. When you stop, Stash generates the AI Capture Report.
Instant Replay — The Recording You Already Made
Instant Replay is a rolling buffer that silently records your screen in the background. No audio is captured — this is a deliberate privacy decision (no ambient microphone). No files are stored to disk. The buffer keeps the last 30–120 seconds (configurable) in memory using compressed frames (~12 MB for 60 seconds at idle).
When you see something worth reporting — a bug, an unexpected behavior, a UI glitch — press ⌘⌃R. The buffer freezes, Stash processes the last N seconds, and produces an AI Capture Report. You paste it into Claude, ChatGPT, or any AI tool. No reproduction step needed — the bug was caught live.
| Without Instant Replay | With Instant Replay |
|---|---|
| Bug happens | Bug happens |
| “Let me record that” | Press ⌘⌃R |
| Try to reproduce the bug | Done — last 30–120s already captured |
| Sometimes can't reproduce | Bug was caught live, first time |
| 2–5 minutes | 10 seconds |
The AI Capture Report
When the user clicks “Copy All” and pastes into an AI tool, the AI receives a unified markdown report:
- Context — primary app, URL, OS, display resolution
- Focus Timeline — every app switch with timestamps, primary app identification
- Console Output — errors and warnings detected via OCR
- What Happened — timestamped interaction table with voice narration interleaved. Each row shows: time, source (Action/User/System), what happened, and outcome. User voice appears as italic quoted text at the exact moment it was spoken.
- Visual Events — panels appearing, modals, spinners, color changes
- State Changes — what changed vs. what didn't between first and last frame
- Key Frames — images extracted at interaction, voice, and visual-change events with timestamps
Two Copy Modes
| Mode | What's Copied | Best For |
|---|---|---|
| Copy All | Report + key frame images + audio file | Claude.ai, ChatGPT, Gemini — paste into chat |
| Copy Folder Path | Path to recording folder | Claude Code, Cursor, terminal tools — paste path |
What Video Capture Reveals That Screenshots Miss
- Timing bugs — animation delays, debounce failures
- Interaction sequences — click A → B → C chain
- State transitions — loading → error, hover → tooltip
- Layout shifts during scroll
- Race conditions between user action and async response
- “Nothing happened” — the most common and hardest-to-describe bug
Key Takeaways
- Video Capture (⌘⌃R) records screen + voice for up to 20 minutes with dynamic frame rate.
- Instant Replay silently buffers the last 30–120 seconds — press ⌘⌃R to save what just happened, no reproduction needed.
- AI Capture Reports give AI tools structured context: voice transcript, interaction log, console OCR, focus tracking, and key frames.
- Voice narration interleaves with the interaction timeline as first-person annotation — the AI reads what you said alongside what you did.
- “Nothing happened” detection and retry detection are the strongest bug signals — Stash captures both automatically.
- Two copy modes: Copy All for paste-into-chat, Copy Folder Path for CLI tools like Claude Code.
Frequently Asked Questions
How does Video Capture differ from regular screen recording?
Video Capture produces structured AI output, not raw video. It generates an AI Capture Report with key frames, voice transcript, interaction log, console OCR, and focus tracking — all in a format AI tools can process directly.
What is Instant Replay?
Instant Replay is a rolling buffer that silently records the last 30–120 seconds of screen activity in memory. No audio is captured, no files are stored. Press ⌘⌃R to save what just happened with a full AI Capture Report.
Does Instant Replay record audio?
No. Instant Replay captures only screen content and interactions — no microphone audio. This is a deliberate privacy decision to prevent ambient recording. Voice narration is only captured during active Video Capture recordings.
How long can a Video Capture recording be?
20 minutes maximum. Active recordings capture screen, voice, interactions, focus tracking, and console output. Recordings auto-stop at the 20-minute limit.
Can I paste Video Capture output into Claude Code?
Yes. Use “Copy Folder Path” to paste the recording folder path into Claude Code. The folder contains the AI Capture Report, key frame images, and audio — Claude Code reads them all via the file path.
What happens if I don't narrate during a recording?
The AI Capture Report works without voice. Machine observation — clicks, console errors, visual changes, focus tracking, state changes — provides enough context for AI diagnosis. Voice narration adds intent and direction but is not required.
Does Video Capture slow down my Mac?
Instant Replay uses ~12 MB of memory at idle (compressed frames at 2–4 FPS) and less than 2% CPU. Active recording uses dynamic frame rate (2–30 FPS) that bursts during activity and settles during idle. Post-processing takes 3–8 seconds.
References and Further Reading
- Apple, “ScreenCaptureKit documentation” — screen capture framework for macOS
- Apple, “SFSpeechRecognizer documentation” — on-device speech recognition
- Apple, “AVAssetWriter documentation” — H.264 + AAC video encoding
- RFC 6962, “Certificate Transparency” — referenced in capture metadata