Skip to content

ASR

ASR (Automatic Speech Recognition) converts speech into text — the first step in the voice input pipeline.

Microphone → [ASR] → Raw text → (optional) Scene + LLM rewriting → Final text

Vinput provides three complementary ASR mechanisms:

  • Local models — Offline recognition powered by sherpa-onnx. No network required, privacy-friendly, low latency.
  • Cloud providers — Third-party ASR APIs (Doubao, Aliyun Bailian, ElevenLabs, OpenAI, etc.). Typically better accuracy, but requires network and API keys.
  • Hotwords — Domain-specific vocabulary to improve recognition of proper nouns with local models (supported by some models).

Local models and cloud providers are mutually exclusive — switch between them at runtime with the F8 menu. Hotwords take effect when a local model is active.

A local model is a set of sherpa-onnx compatible model files that run entirely offline. Each model has its own language, type, and size. Only one can be active at a time.

Corresponding config:

{
"asr": {
"providers": [
{
"id": "sherpa-onnx",
"type": "local",
"model": "model.sherpa-onnx.sense-voice-zh-en-ja-ko-yue-int8",
"timeout_ms": 15000
}
]
}
}

In Vinput GUI, go to Resources → Models:

  • Available models list: click Download to install
  • Installed models list: click Use to activate, Remove to uninstall
Terminal window
vinput model list # List installed models
vinput model list -a # List available remote models
vinput model add <name> # Download and install
vinput model use <name> # Activate
vinput model remove <name> # Uninstall
vinput model info <name> # View details

A cloud provider is an external script that receives an audio stream, calls a third-party ASR API, and returns recognized text. Each provider has its own environment variable config (API key, URL, etc.).

Providers come in two modes:

  • Non-streaming — Sends audio after recording ends, waits for complete result
  • Streaming — Recognizes in real time as you speak, returns intermediate results

Corresponding config:

{
"asr": {
"active_provider": "provider.doubaoime.streaming",
"providers": [
{
"id": "provider.bailian.streaming",
"type": "command",
"command": "python3",
"args": ["~/.local/share/vinput/providers/bailian/streaming"],
"env": {
"VINPUT_ASR_API_KEY": "your-api-key",
"VINPUT_ASR_MODEL": "qwen3-asr-flash-realtime"
},
"timeout_ms": 60000
}
]
}
}

In Vinput GUI, go to Resources → ASR Providers:

  • Click Install to download a provider script
  • After installation, go to the Control page to select and edit provider environment variables (e.g. API key)
Terminal window
vinput provider list -a # List available remote providers
vinput provider add <id> # Install
vinput provider use <id> # Switch to this provider
vinput provider edit <id> # Edit config (environment variables)
vinput provider remove <id> # Uninstall
ProviderModeDescription
Doubao (non-streaming)Non-streamingDoubao / Volcengine fast file recognition
ElevenLabsNon-streaming / StreamingElevenLabs speech-to-text API
Aliyun BailianNon-streaming / StreamingQwen3-ASR via OpenAI-compatible / Realtime API
Doubao (streaming)StreamingDoubao ASR Realtime via AI Gateway
Doubao IME (streaming)StreamingUnofficial Doubao IME real-time protocol
OpenAI-compatibleNon-streaming / StreamingOpenAI /v1/audio/transcriptions or Realtime WebSocket

A hotword file is a text file with one term per line, used to boost recognition accuracy for specific vocabulary with local models. Typical use cases: names, brand names, technical terms.

Not all models support hotwords — the model list indicates support.

In Vinput GUI, go to the Hotwords tab to edit.

Terminal window
vinput hotword get # View current hotword file path
vinput hotword set <path> # Set hotword file
vinput hotword edit # Open hotword file in editor
vinput hotword clear # Clear hotword config