Audio Output

Dual-Output Architecture

The app plays audio simultaneously to two independent output devices:

graph LR
    subgraph TTS Engine
        A[Audio Buffer]
    end
    A --> B[Monitor Output]
    A --> C[Secondary Output]
    B --> D[Headphones/Speakers]
    C --> E[Virtual Cable Input]
    E --> F[Voice App Mic Input]

Monitor Output

The primary output device — typically your headphones or speakers. This is where you hear the speech.

Secondary Output

The virtual audio cable input. Voice applications (Discord, VRChat) use this as their microphone input.

Volume Controls

Independent volume sliders for each output:

Monitor Volume: 0–100% (default: 100%)
Secondary Volume: 0–100% (default: 100%)

Volumes are applied as float multipliers (0.0–1.0) to the NAudio WasapiOut stream.

Device Selection

In Settings → Audio:

Monitor Output dropdown — lists all active WASAPI render devices
Secondary Output dropdown — lists all active WASAPI render devices
Refresh button — re-enumerates devices (useful after plugging/unplugging)

Device IDs

Device IDs are stored as WASAPI device identifiers. If a device is disconnected or drivers change, the saved ID may become invalid. The app will show a warning and you'll need to re-select the device.

Test Buttons

Three test buttons help verify your audio setup:

Button	What it does
Test Monitor	Plays a short test tone to the monitor output only
Test Secondary	Plays a short test tone to the secondary output only
Test Both	Plays a short test tone to both outputs simultaneously

Trailing Silence Trimming

When enabled, the app automatically removes trailing silence from generated TTS audio:

Trim Trailing Silence: Enable/disable the feature
Silence Retention: What percentage of trailing silence to keep (5–100%)
- 5% = almost all silence removed (fastest turnaround)
- 100% = no trimming (default)

How It Works

After TTS synthesis, the raw 16-bit PCM data is scanned backwards from the end
Frames where all samples are below the silence threshold (~0.5% of max amplitude) are identified
The specified fraction of trailing silence is retained
The trimmed audio is passed to the audio router

Enabling silence trimming can significantly reduce the gap between pressing Enter and hearing speech, especially with voices that produce long trailing silences.

Audio Format

Property	Value
Kokoro sample rate	24000 Hz
ElevenLabs sample rate	44100 Hz (decoded from MP3)
Bit depth	16-bit PCM
Channels	Mono (1)
Playback API	NAudio WasapiOut

Playback State

The PlaybackState singleton tracks:

IsPlaying — whether audio is currently playing
CurrentText — the text being spoken
PlaybackDurationSeconds — total audio duration
PlaybackStartedUtc — when playback began

This state is shared between the overlay and phrase playback, ensuring the playback timer and status display are consistent.