jordanrendric/claude-video-vision
78 stars · Last commit 2026-04-22
Give Claude the ability to watch and understand videos — Claude Code plugin with frame extraction and multimodal audio analysis
README preview
<p align="center"> <img src="./assets/hero.avif" alt="claude-video-vision" width="100%" /> </p> # Claude Code Video Vision Give Claude the ability to **watch and understand videos**. A Claude Code plugin that extracts frames via ffmpeg and processes audio via multiple backends (Gemini API, local Whisper, or OpenAI API). Claude receives frames as images and audio transcription with timestamps — the plugin is a **perception layer**, not an interpretation layer. ## Features - **Multimodal perception** — Claude sees video frames directly and reads audio transcriptions with timestamps - **Flexible backends** — Choose between cloud APIs or fully local processing - **Adaptive extraction** — Claude adjusts fps, time range, and resolution based on your question - **Auto-installation** — Whisper models download automatically on first use - **Interactive setup wizard** — `/setup-video-vision` walks you through configuration ## Quick Start