I've been using my Linux desktop a lot more recently. The prenatal testing project I'm working on has me spending more time on the machine, and I wanted to make sure my development environment was as productive as possible. When you're deep in a project, small friction points add up quickly.
One thing I missed immediately: voice dictation.
On Mac, I use SuperWhisper constantly. I dictate to Claude, to Grok, and especially when working with AI coding tools like Claude Code. The workflow is simple - hold a key, speak, release, and the text appears. It keeps me in flow without breaking to type out longer prompts or explanations. When you're working with AI tools, you're often explaining context, describing what you want, or talking through problems. Typing all of that out breaks the conversational rhythm.
The problem? SuperWhisper doesn't exist for Linux. And the built-in dictation options are either cloud-based (privacy concerns, latency, requires internet) or ancient tools that barely work. Cursor has some built-in dictation, but Claude Code and any of the CLI tools don't have anything. I needed something that would work everywhere - in the terminal, in the browser, in any application.
So I built my own. It took less than an hour.
The solution
SoupaWhisper is a ~250 line Python script that does exactly what SuperWhisper does, powered by OpenAI's Whisper model running entirely locally via faster-whisper.
The workflow is identical to what I was used to: hold F12, speak, release. The text gets typed into whatever window is active and copied to the clipboard if I want to paste it elsewhere. Nothing fancy - just local speech-to-text that actually works.
I built this using Claude Code, which is part of why it came together so quickly. The combination of Claude Code for the implementation and Linux's straightforward tooling made this surprisingly simple. I was genuinely taken aback by how easy it was to put together. Linux has all the building blocks ready to go - audio capture, clipboard management, keyboard simulation - you just need to wire them together.
Why Python?
I normally reach for Go for almost everything. It's my language of choice and I've written extensively about why. But Python made more sense here for a few reasons.
The faster-whisper bindings are Python-native, so there's no fighting with FFI or CGO bindings. Downloading and managing the Whisper models is trivial with Python's ecosystem. The code stays minimal because you're not dealing with any impedance mismatch between the AI libraries and your application code.
I'm also doing more Python work on the prenatal project right now, so staying in the same language reduces context switching. When you're bouncing between projects, having one less thing to mentally switch on helps.
Technical implementation
The architecture is dead simple. Here's the entire flow:
