The two classes of dictation apps
When dictation software started getting popular again in 2022–2023, the promise was simple: press a hotkey, speak, text appears. Two years later, many of the most popular apps have quietly added an LLM layer between “what you said” and “what gets pasted.” That layer is the source of most dictation software complaints you will find in forums today.
Class 1: Verbatim. The speech recognition model transcribes your audio. The result is cleaned for spelling and punctuation. It is pasted. That is it. What you said is what appears. If the model mishears a word, you fix it. If you swore, it is there. If you self-corrected mid-sentence, both versions are there.
Class 2: AI-enhanced. After the speech recognition step, an LLM processes the transcript before pasting. It can remove filler words, fix grammar, resolve self-corrections, adapt tone per app, and rewrite phrases it judges to be awkward. When it works well, the output is polished prose that would have taken a manual editing pass. When it goes wrong, it removes words you chose deliberately, censors vocabulary it has been trained to flag, rewrites technical content it does not understand, and occasionally produces an “answer” to something you said as a question rather than transcribing it.
The same feature that makes AI-enhanced dictation feel magical for casual email is the feature that makes it maddening for code comments, legal drafts, or anything where your exact words matter.
What each class does with the same phrase
Take this dictated sentence: “let’s meet at 3pm… wait no, 4pm, and bring the fucking deck”
Verbatim output: let's meet at 3pm... wait no, 4pm, and bring the fucking deck
Or with VAD-based silence trimming: let's meet at 3pm... wait no, 4pm, and bring the fucking deck
AI-enhanced output (typical): Let's meet at 4pm and bring the deck.
The AI version resolved the self-correction (good), removed the swearing (maybe fine, maybe not), removed the ellipsis hesitation (fine), and capitalized the sentence (fine). Every one of those changes was a judgment call the app made without asking you.
For a Slack message to a friend, the AI version is objectively better. For a direct quote you were transcribing, a legal statement, or a Cursor prompt where “the fucking deck” was a technical reference to a specific file named that, the AI version is wrong in ways that are hard to catch on review.
When you want verbatim
Developers dictating technical content. Code, terminal commands, function names, file paths, stack traces. kubectl apply -f values.yaml should not become “cube control apply F values yaml.” async/await should not become “async await.” The custom dictionary in verbatim apps handles the recurring terms; the verbatim model at least does not actively rewrite them.
Journalists and researchers transcribing or drafting. When you are capturing someone else’s words — or your own exact phrasing — an AI that “improves” the prose is producing a document that no longer says what was said.
Legal and medical professionals. Formulations matter. “The patient denied suicidal ideation” is different from “the patient said they were not thinking about suicide.” An AI that paraphrases for clarity can introduce a distinction the original speaker did not intend.
Writers with a voice. A paying lifetime member on the SuperWhisper subreddit put it plainly: “I didn’t buy a ‘make what I’m saying better’ product. I just want it to dictate and fix spelling and sentence structure.” The complaint is not about AI cleanup in principle — it is about an AI layer that takes more liberty than the user asked for. If you have developed a writing voice over years, an AI that smooths it out is actively working against you.
Anyone dictating in a language or register the AI does not know well. Non-standard vocabulary, dialect, slang, domain-specific jargon. AI post-processing tends to normalize toward whatever the training data considered “standard,” which means it flattens precisely the linguistic choices that made your writing yours.
When you want AI-enhanced
Casual correspondence. Emails, Slack messages, text messages where the goal is clear communication and you do not have a strong attachment to your exact phrasing. The AI cleanup removes the cognitive overhead of editing.
Non-native speakers of the transcription language. If you think in one language and dictate in another, grammar correction is genuinely helpful rather than intrusive.
People who dictate continuously and hate the editing pass. If your workflow is “speak everything, edit nothing,” AI cleanup trades some accuracy for time saved. For high-volume casual dictation, the tradeoff is reasonable.
People who are not yet comfortable dictating polished prose. The AI layer can serve as a bridge while you develop dictation habits — acting as a real-time copy editor until the spoken output is clean enough on its own.
The apps and which class they belong to
| App | Class | AI layer | Platform | Switchable? |
|---|---|---|---|---|
| SnailText | Verbatim by default | Optional local LLM (Pro, on-device) | Mac, Windows | Yes — one toggle |
| MacWhisper | Verbatim | None | Mac | No |
| Parakeety | Verbatim | None | Mac | No |
| SuperWhisper (Smart Modes off) | Verbatim | Disabled | Mac, Windows, iOS | Yes — but buried |
| SuperWhisper (Smart Modes on) | AI-enhanced | Modal cloud — sends app context, clipboard | Mac, Windows, iOS | Default state |
| Wispr Flow | AI-enhanced | Always on — cloud, no disable | Mac, Windows, iOS, Android | No |
| Apple Dictation | Verbatim | None | Mac | No |
SuperWhisper Smart Modes sends additional context (app name, clipboard, focused window) to Modal’s cloud even when the STT runs locally — documented in SuperWhisper’s own interface.
Getting verbatim output from SuperWhisper
If you are already on SuperWhisper and want verbatim output, the settings exist but require some digging. The key is creating a custom Mode with a prompt that explicitly instructs the model to transcribe rather than rewrite.
A prompt shared in the SuperWhisper community that works well:
You are a real-time speech-to-text transcription assistant. Your only job is to clean and return the final transcription. Process fast — minimize latency. Rules: Remove filler words and hesitations (uh, um, er, like, you know, wait, hmm) unless they carry explicit meaning. Fix spelling, grammar, and punctuation errors silently. If the speaker self-corrects mid-sentence, resolve the correction and output only the intended final version. Output ONLY the clean final transcription. No explanations, no introductions, no conversational responses, no metadata.
Note that this approach still uses SuperWhisper’s cloud post-processing layer — your audio context is sent to Modal’s servers even with a custom prompt. If the issue is rewriting and the cloud layer is not a concern, this works. If you want both verbatim output and local-only processing, a different app is the structurally simpler answer.
SnailText’s approach: verbatim by default, AI when you want it
SnailText was built from the premise that what you say should be what gets pasted. The transcription pipeline — Whisper or Parakeet TDT running locally — produces text from your audio, adds punctuation based on how you spoke, and pastes it. Nothing rewrites, paraphrases, or second-guesses your vocabulary. This is the default behavior on both Mac and Windows, for every user including the free tier.
For users who do want AI cleanup, Pro includes an optional local LLM post-processing step. The model runs on your device — not on a remote server — so the same privacy properties hold. You control the prompt. You can turn it off at any time. The verbatim path is always one toggle away.
This is the same architectural decision that Parakeety, MacWhisper, and several other newer apps have made: give users clean transcription first, and let them add AI on top if they want it. The alternative — AI on by default, verbatim buried in settings — is what produces the frustration in the forum threads.
The bottom line
If your dictation app is rewriting things you did not ask it to rewrite, that is not a configuration error you missed. It is the product working as designed. The question is whether that design matches what you actually need.
For code, quotes, legal drafts, or any content where exact wording matters: verbatim app, local processing, no AI layer between your voice and your cursor.
For casual email, Slack, and high-volume prose where cleanup is a net positive: AI-enhanced apps do exactly what they advertise.
SnailText’s free tier is verbatim by default — download for Mac or Windows and test it against whatever you’re currently using. If the AI layer turns out to be something you want, Pro has it. If it is something you want to avoid forever, the free tier has no LLM in the path at all.
Related reading:
- SuperWhisper alternatives — full comparison including the Smart Modes privacy finding
- How it works — the SnailText pipeline in detail
- Wispr Flow alternatives — 9 tools compared across privacy, accuracy, and value
- For vibe-coders — voice dictation for AI coding agents, where verbatim output matters most