SnailText
EN

Voice to text on Windows

Voice to text on Windows — and the gaps in what Windows ships

Windows has built-in dictation. It works for short English bursts in Microsoft apps. For sustained work, non-English languages, or anything offline, it has documented structural limits. SnailText is the local Whisper alternative.

The short version

Windows ships two voice features that are often confused. Voice Typing (Win+H) is cloud-based — audio goes to Azure on every dictation, English plus ~43 cloud locales, times out after about 5-10 seconds of silence (uncustomizable). Voice Access (Windows 11 22H2+ only) is offline but supports only 11 locales — English variants, two Spanish, German, French, Italian, Japanese, two Chinese — no Russian, no Portuguese, no Slavic, no Nordic. "Fluid Dictation" (the polished 2025 auto-punctuation feature) requires a Copilot+ PC with NPU hardware. SnailText runs the Whisper speech model locally on any modern Windows PC, works in any app, supports any Whisper language, and does not depend on Microsoft's language pack ecosystem.

Two Windows voice features, both with structural limits

Most reviews conflate Voice Typing and Voice Access. They are different tools with different processing models. Both ship with Windows; neither covers what a daily-dictation workflow needs in 2026.

FeatureVoice Typing (Win+H)Voice Access (Win 11 22H2+)SnailText
ProcessingCloud — audio to Azure on every dictation, requires internetOffline — runs on deviceOffline — Whisper runs locally on your PC
Language coverage~43 cloud locales (Microsoft does not enumerate them in one place)11 locales only: English variants, Spanish (ES/MX), German, French (FR/CA), Italian, Japanese, Simplified + Traditional ChineseAny Whisper-supported language (100+) — Russian, Portuguese, Polish, Dutch, Nordic, all there
Pause timeoutAbout 5-10 seconds of silence stops the session; uncustomizableSame uncustomizable cutoffUnlimited — runs until you press the hotkey again
HotkeyWin+H, not customizableVoice-command-only activation; toolbar must be visibleGlobal Ctrl+Shift+Space (configurable to any combo)
Where it worksMost text fields, but documented compatibility gaps (Anki, some Word fields, some browser textareas show "limited" warning)Microsoft apps mostly; third-party app behavior variesAny text field in any app — paste-based, like Ctrl+V
Auto-punctuationToggle, accuracy flaky; "comma" command unreliable per Microsoft Q&A threadsSame toggle, same reliabilityWhisper handles punctuation from prosody — no commands to memorize
"Fluid Dictation" polish (grammar fix, filler removal)Copilot+ PC only (NPU required: Snapdragon X, Intel Core Ultra, AMD Ryzen AI). English only.Not availableCustom dictionary + snippets (Pro tier) for similar end result, any hardware
Windows 10 supportYesNo — Windows 11 22H2+ only. Windows 10 has the older Speech Recognition tool, different featureYes — Windows 10 (1903+) and Windows 11

Sources for Microsoft claims: linked in the body section below. The "5-10 second pause cutoff" is documented in user forum threads and Microsoft Q&A responses, not in Microsoft marketing copy.

Two voice-typing features in Windows, and what each actually is

Windows in 2026 ships two separate voice-typing features. Most articles online treat them as one product. They are not.

Voice Typing (activated by Win+H) is a cloud-based dictation tool. Microsoft documents this explicitly: "To use voice typing, you'll need to be connected to the internet". On every dictation session, your microphone audio is sent to Microsoft's Azure Speech services for transcription. The text comes back, gets pasted into the focused text field, and your audio (according to Microsoft) is de-identified and not stored without consent — but it has left your device.

Voice Access is the newer feature, added in Windows 11 22H2 (October 2022). It is a broader accessibility tool that includes dictation but also lets you control the OS by voice — open apps, click buttons, scroll, navigate. The dictation portion of Voice Access runs on-device, offline. It does not exist on Windows 10. On Windows 10 you instead get the older "Windows Speech Recognition" tool, which is a separate, much older feature.

The practical difference: Voice Typing supports more languages but always needs internet. Voice Access runs offline but supports fewer languages. Neither one does both.

The language coverage gap is the real story

Voice Access — the offline option — ships with only 11 distinct locales: six English variants (US, UK, India, New Zealand, Canada, Australia), two Spanish (Spain, Mexico), German, French (France and Canada), Italian, Japanese, Simplified Chinese, and Traditional Chinese (Taiwan). That is it.

What is missing: Russian, Portuguese (both Brazil and Portugal), Polish, Dutch, Swedish, Danish, Norwegian, Finnish, Czech, Hungarian, Greek, Turkish, Hindi, Arabic, Korean, Thai, Vietnamese, and dozens more. When asked about Swedish on Microsoft's own Q&A forum, the official response confirms gaps are "by design" with no roadmap commitment.

Voice Typing — the cloud option — supports more (~43 languages including Portuguese, Korean, Thai, Turkish, Vietnamese, Hindi). But it sends your audio to Microsoft on every dictation. For anyone whose dictation contains client information, medical notes, source code, or anything sensitive, "cloud STT with no offline option" is the wrong architecture.

SnailText runs Whisper locally. Whisper is multilingual by design — the same model that handles English handles 100+ languages including all the ones Microsoft's offline option does not. Russian dictation works on SnailText. Portuguese dictation works on SnailText. Polish, Dutch, Czech, all on the same install. No language packs to download. No cloud detour.

The Win+H pause timeout — most-cited complaint

Windows Voice Typing has an uncustomizable silence timeout that ends the dictation session after roughly 5 to 10 seconds of pause. The exact number is not in Microsoft's marketing copy, but it is the subject of multiple user threads — including a long-running Microsoft Q&A thread and a Windows Forum thread asking how to prevent it. The answer in both: you cannot.

For composing an email longer than two paragraphs, this means re-activating Win+H two or three times in a single message. For thinking-while-dictating workflows — research notes, treatment plans, brief drafts where pauses for thought are normal — the cutoff makes the tool feel like it is fighting you.

SnailText runs as long as you hold the hotkey down, or until you press it again to stop. There is no silence timeout. A five-minute brain-dump dictates as one session.

"Fluid Dictation" is hardware-gated — most PCs do not qualify

Microsoft's 2025 marketing push for Voice Typing centered on "Fluid Dictation" — a polish layer that adds auto-punctuation, removes filler words (um, uh), and corrects grammar in real time. The reviews of this feature are positive when it works.

Microsoft's own documentation states Fluid Dictation requires a Copilot+ PC — meaning a dedicated NPU (Neural Processing Unit) in your hardware. Snapdragon X (Surface laptops 2024+), Intel Core Ultra with NPU, or AMD Ryzen AI. And it is English-only.

For 2026, the install base of Copilot+ PCs is still small. A standard Windows 11 PC bought in 2022 or 2023, without an NPU, gets the older, rawer Voice Typing experience — no auto-grammar fix, no filler removal, no real-time polish. The 2025 marketing applied to perhaps 5-10% of the Windows install base.

How SnailText fills the Windows voice-to-text gaps

Local processing. SnailText runs the Whisper speech model on your PC — CPU on older machines, Vulkan on AMD and Intel iGPUs, CUDA on NVIDIA GPUs. Audio is captured into a RAM buffer, processed by the model, transcribed text is pasted at your cursor, audio is discarded. See how it works for the architecture diagram. Verifiable in your network monitor — no outbound traffic during dictation.

Any language Whisper supports. 100+ languages on the same install, no language packs to download. Russian works the same as English. Portuguese works the same as French. No locale-specific gaps — see also our offline dictation page for the architectural argument.

No timeout. Press the hotkey, talk for as long as you want — five seconds or five minutes — press again to stop. The transcript is one block.

Configurable hotkey. Default is Ctrl+Shift+Space; reassign to anything that does not conflict with your other shortcuts. No Win+H lock-in.

Works in any app. SnailText pastes into the focused text field, same way Ctrl+V does. Slack, Chrome textareas, VS Code, Cursor, terminal emulators, browser-based EHRs, web forms, Anki — anywhere a keyboard works, dictation works. No "Voice typing functionality is limited on this app" warnings like the documented Anki experience.

Free to start. The compact Whisper Base model handles everyday English dictation; Pro adds larger Whisper models and 25+ European languages via Parakeet TDT. If you want the cross-platform story, see voice to text on Mac. For the broader "free" angle (no signup, no time limit), see free voice to text.

How to set up voice to text on Windows in 60 seconds

1. Download the SnailText installer from snailtext.app/download/windows/.

2. Run the installer. Windows SmartScreen may prompt because SnailText is not yet Authenticode-signed by a Microsoft-recognized certificate authority — click "More info" → "Run anyway". Authenticode certification is in progress.

3. On first launch, SnailText downloads the default Whisper model (Base, around 80 MB) and loads it.

4. Set your global hotkey in Settings. Default is Ctrl+Shift+Space.

5. Open any app — Slack, Chrome, Word, Notion, your IDE. Press the hotkey. Talk. Press it again. Your transcribed text appears at the cursor.

Frequently asked questions

Does this work on Windows 10?

+

Yes. SnailText supports Windows 10 (64-bit, 1903 or later) and Windows 11. Voice Access — the offline Microsoft option — is Windows 11 22H2+ only. On Windows 10, SnailText is one of the few options that gives you modern Whisper-class dictation at all.

How is this different from Voice Typing (Win+H)?

+

Voice Typing requires an internet connection and sends your audio to Microsoft's Azure servers on every dictation. SnailText runs the Whisper model locally — audio never leaves your PC. Voice Typing has an uncustomizable 5-10 second pause timeout; SnailText runs until you press the hotkey to stop. Voice Typing supports about 43 cloud languages but no offline mode; SnailText supports any Whisper language (100+) offline.

How is this different from Voice Access?

+

Voice Access is Windows 11 22H2+ only and supports just 11 offline locales (English variants, Spanish, German, French, Italian, Japanese, Chinese). If you need Russian, Portuguese, Polish, Dutch, or any Nordic or Slavic language, Voice Access does not cover you. SnailText runs Whisper which supports 100+ languages offline on the same install.

Why does Microsoft's offline option support so few languages?

+

Microsoft has confirmed on its own support forums that the limited Voice Access language list is by design, with no public roadmap to expand. The cloud Voice Typing has broader coverage but at the cost of sending all audio to Azure. SnailText sidesteps this by running Whisper — which was open-sourced multilingual from day one.

Is the "Fluid Dictation" feature available on my PC?

+

Probably not, unless you bought a Copilot+ PC in 2024 or later — meaning a laptop with a dedicated NPU (Snapdragon X, Intel Core Ultra with NPU, or AMD Ryzen AI). Microsoft's documentation explicitly gates Fluid Dictation to Copilot+ hardware, and it is English-only. A standard 2022-2023 Windows 11 laptop gets the rawer Voice Typing experience without the polish.

Do you upload my audio anywhere?

+

No. Whisper runs locally inside SnailText on your PC. The audio buffer stays in RAM during a recording session and is not written to disk. We do not upload audio to any server in any mode, free or paid. You can verify in your network monitor — no outbound traffic during dictation. The only outbound calls SnailText makes are software update checks (disable in Settings) and, for Pro users, license verification once per session.

Does it work without an NVIDIA GPU?

+

Yes. SnailText auto-detects available GPU acceleration. NVIDIA CUDA is fastest, but Vulkan (AMD and Intel iGPUs from 2020 onward) and CPU fallback both work. On a typical 2022+ Windows laptop, you will get multiple-times-real-time Whisper Medium performance even without a discrete GPU.

Will Windows SmartScreen flag the installer?

+

It may, on first run, because SnailText is not yet Authenticode-signed by a Microsoft-recognized certificate authority. The "More info" → "Run anyway" path works. Authenticode certification is in progress.

Can I use it for code dictation in VS Code or Cursor?

+

Yes — SnailText pastes into any text field, including VS Code and Cursor textareas. The custom dictionary (Pro) is useful for code: add terms like "kubectl", "gRPC", "async/await" and SnailText replaces the misheard versions on the way to the editor. Microsoft Voice Typing has documented compatibility issues with some third-party apps — SnailText does not, because it operates through the system paste mechanism.

Voice to text on Windows. Local. Any language. Free to start.

Download for Windows 10 or 11. Compact Whisper model runs on any modern PC. No language packs, no cloud roundtrip, no pause timeout.