Audio Output
Dual-Output Architecture
The app plays audio simultaneously to two independent output devices:
graph LR
subgraph TTS Engine
A[Audio Buffer]
end
A --> B[Monitor Output]
A --> C[Secondary Output]
B --> D[Headphones/Speakers]
C --> E[Virtual Cable Input]
E --> F[Voice App Mic Input]Monitor Output
The primary output device — typically your headphones or speakers. This is where you hear the speech.
Secondary Output
The virtual audio cable input. Voice applications (Discord, VRChat) use this as their microphone input.
Volume Controls
Independent volume sliders for each output:
- Monitor Volume: 0–100% (default: 100%)
- Secondary Volume: 0–100% (default: 100%)
Volumes are applied as float multipliers (0.0–1.0) to the NAudio WasapiOut stream.
Device Selection
In Settings → Audio:
- Monitor Output dropdown — lists all active WASAPI render devices
- Secondary Output dropdown — lists all active WASAPI render devices
- Refresh button — re-enumerates devices (useful after plugging/unplugging)
Device IDs
Device IDs are stored as WASAPI device identifiers. If a device is disconnected or drivers change, the saved ID may become invalid. The app will show a warning and you'll need to re-select the device.
Test Buttons
Three test buttons help verify your audio setup:
Trailing Silence Trimming
When enabled, the app automatically removes trailing silence from generated TTS audio:
- Trim Trailing Silence: Enable/disable the feature
- Silence Retention: What percentage of trailing silence to keep (5–100%)
- 5% = almost all silence removed (fastest turnaround)
- 100% = no trimming (default)
How It Works
- After TTS synthesis, the raw 16-bit PCM data is scanned backwards from the end
- Frames where all samples are below the silence threshold (~0.5% of max amplitude) are identified
- The specified fraction of trailing silence is retained
- The trimmed audio is passed to the audio router
Enabling silence trimming can significantly reduce the gap between pressing Enter and hearing speech, especially with voices that produce long trailing silences.
Audio Format
Playback State
The PlaybackState singleton tracks:
IsPlaying— whether audio is currently playingCurrentText— the text being spokenPlaybackDurationSeconds— total audio durationPlaybackStartedUtc— when playback began
This state is shared between the overlay and phrase playback, ensuring the playback timer and status display are consistent.