Blurt: Faithful, On-Device Voice Dictation for the Mac
Abstract. Typing is the slowest interface most people use all day. Blurt removes it: hold one key, speak, and clean, punctuated text appears exactly where your cursor is — in any application, online or off. Recognition runs entirely on the Mac's Neural Engine, so your speech is never transmitted. The system is built around a single principle we consider non-negotiable for dictation: faithfulness. It transcribes what was said, corrects only the spelling of names and terms you have taught it, and never paraphrases. This note describes the design, the engineering decisions behind it, and the trade-offs we accepted to keep the program small, fast, and private.
Overview
Blurt is infrastructure, not an application you sit inside. A small mark rests in the menu bar; there is no window to manage and nothing to switch into. Holding the push-to-talk key records while it is held; releasing it inserts recognized text at the active cursor. A double-tap enters a hands-free mode for longer passages. The design goal is narrow and exacting: speaking into a field should feel as immediate and as predictable as typing into it.
System
The pipeline is deliberately short, and every stage is local. Audio is captured at 16 kHz mono and only while the key is held — there is no always-listening path, no wake word, and no buffer that outlives an utterance. Captured audio is passed to an on-device recognizer, its output is repaired by a deterministic text pass, and the result is inserted through the system's standard text-input services. No network is contacted at any point in this loop.†
On-device recognition
Recognition uses a compact multilingual Whisper model compiled to Core ML and executed on the Apple Neural Engine. The model is fetched once on first launch and cached; thereafter the system runs fully offline. Recognition is biased toward a user-supplied glossary: names, products, and domain terms are injected as decoder prompt tokens, which measurably improves the spelling of the proper nouns that generic models most often get wrong. We chose a small model on purpose — it is the largest that runs with reliably sub-second latency on the Neural Engine without thermal or memory cost — and we tune around it rather than reaching for a heavier one.
Faithfulness
A large language model can rephrase speech into polished prose, but it cannot reliably tell a genuine self-correction (“make it 3 pm — no, 4”) from a narrated one (“he said no, no, I'm here”), and it changes words you actually spoke. For dictation that is a defect, not a feature. Blurt therefore does not rewrite. A deterministic pass repairs spacing, removes filler sounds, turns a spoken “new line” into an actual break, and drops commas a recognizer wrongly inserts between adjacent names. An optional, strictly guarded correction step may run when it is already loaded, but it is bounded by a hard timeout, never blocks insertion, and is rejected outright if it adds or invents any words. Every word you spoke is kept.
Vocabulary
Proper nouns are the hardest case for any recognizer, and the one where errors are least forgivable — a misspelled name reads as carelessness. Blurt lets you supply names and term sets, which both bias recognition toward the correct spelling and back a conservative correction step. Your own name, captured at setup, is added automatically, so it is rendered correctly the first time you dictate it.
Latency and footprint
| Property | Value |
|---|---|
| Audio or dictation sent off device | 0 bytes |
| Speech-to-text latency (warm) | ~1 s |
| Download size | ~7 MB |
| Applications supported | every text field |
| Password or sign-in | none |
| Offline after first launch | 100% |
The download is a few megabytes; the recognition model is retrieved once on first launch and then lives on your machine. After that, Blurt needs no connection to do its job.
Privacy
Privacy here is a property of the architecture, not a policy promise. Speech is recognized and processed on the device and is never transmitted; there is nothing on a server to log, retain, or disclose. The only network request Blurt ever makes is a one-time signup — your name and email, sent once when you first set up the app — which we keep so we can talk to the people using it. Your dictation is not part of that, and never will be.
Availability
Blurt runs on macOS 14 and later on Apple Silicon, and is free to try. Setup takes about a minute and two permissions — the microphone, to hear you, and accessibility, to type into other applications. Both grants stay on the device. On first launch, right-click the app and choose Open (it is signed, but not yet notarized through Apple).
† “On-device” means recognition and text processing run on your machine; your audio and dictation are never transmitted. The one exception is a single signup request — name and email — made when you first set up the app.
© 2026 P99 Labs · Speech Systems Group · research@p99lab.com