/admin/roles |
POST |
Create a new role with permissions (requires
TTS Settings
API Endpoints
GET /admin/tts-settings
Response: { "settings": TtsSettings }
POST /admin/tts-settings
Body: Partial<TtsSettings>
Response: { "settings": TtsSettings }
Settings Reference
| Setting |
Range |
Default |
Description |
stability |
0.0 — 1.0 |
0.5 |
Voice stability: lower = more variable, higher = more consistent. |
similarity_boost |
0.0 — 1.0 |
0.75 |
How closely the voice matches the target voice model. |
style |
0.0 — 1.0 |
0.0 |
Exaggeration level of voice style (0 = normal, 1 = maximum exaggeration). |
speed |
0.1 — 2.0 |
1.0 |
Speech playback speed multiplier (1.0 = normal). |
use_speaker_boost |
boolean |
true |
Enable speaker boost for clearer, more prominent voice output. |
STT Timing Settings
GET /admin/stt-timing
Response: { "settings": SttTimingSettings }
POST /admin/stt-timing
Body: Partial<SttTimingSettings>
Response: { "settings": SttTimingSettings }
| Setting |
Range |
Default |
Description |
commit_merge_ms |
0 — 10000 |
2500 |
Buffer VAD commits for this duration before translating to merge short fragments. |
stability_timeout_ms |
0 — 10000 |
3000 |
Wait for stable partial text (unchanged) before translating if no VAD commit fires. |
tts_segment_pause_ms |
0 — 2000 |
0 |
Pause between consecutive TTS audio segments sent to frontend. |
max_accumulation_ms |
0 — 60000 |
30000 |
Maximum time to accumulate words during continuous speech before force-dispatching for translation. |
vad_threshold |
0.0 — 1.0 |
0.5 |
Voice Activity Detection sensitivity: higher = stricter noise filtering. |
vad_silence_threshold_secs |
0.1 — 5.0 |
1.5 |
Seconds of silence required for VAD to commit a transcript segment. |
min_speech_duration_ms |
0 — 1000 |
100 |
Ignore speech shorter than this duration (milliseconds). |
min_silence_duration_ms |
0 — 1000 |
100 |
Minimum silence gap required to separate speech segments (milliseconds). |
flush_on_sentence_boundary |
boolean |
false |
Split and dispatch text at sentence boundaries (.?!;) instead of all at once. |
min_chars_before_dispatch |
0 — 1000 |
400 |
Minimum characters accumulated before a chunk is dispatched for translation. |
Video Call Settings
GET /admin/video-settings
Response: { "stability_ms": number, "commit_merge_ms": number, "translation_provider": string }
POST /admin/video-settings
Body: Partial<VideoCallSettings>
Response: VideoCallSettings
| Setting |
Range |
Default |
Description |
stability_ms |
0 — 5000 |
500 |
Wait for stable partial text before translating in video call mode. |
commit_merge_ms |
0 — 500 |
50 |
Merge VAD commits within this window for video call STT. |
translation_provider |
libretranslate | claude | deepl | google |
claude |
Translation provider used for video call real-time translation. |
Configuration
TTS settings are configured in application.yaml under the elevenlabs.tts_settings section:
elevenlabs:
tts_model: "eleven_multilingual_v2"
tts_settings:
stability: 0.5
similarity_boost: 0.75
style: 0.0
speed: 1.0
use_speaker_boost: true
stt_model: "scribe_v2_realtime"
Runtime Persistence
All TTS and STT settings are persisted to Redis via the admin API endpoints. Changes apply immediately to new sessions — existing streams continue with previously-loaded settings. Settings are reloaded from Redis on server restart.
Default Voice
The default voice ID used when no voice is explicitly specified is configured in application.yaml:
elevenlabs:
default_voice_id: "kxj9qk6u5PfI0ITgJwO0"
STT Timing Settings
Control speech-to-text recognition timing, VAD (Voice Activity Detection) parameters, and buffering behavior. These settings affect how quickly transcripts are dispatched for translation during live broadcasts and personal sessions.
| Setting |
Default |
Description |
commit_merge_ms |
2500 |
Milliseconds to buffer VAD commits before translating—merges short sentence fragments into larger chunks. |
stability_timeout_ms |
3000 |
Milliseconds to wait for stable partial text before dispatching for translation when VAD does not commit. |
tts_segment_pause_ms |
0 |
Pause duration (ms) between consecutive TTS audio segments—sent to frontend for playback timing. |
max_accumulation_ms |
30000 |
Maximum milliseconds to accumulate words during continuous speech before force-dispatching for translation. |
vad_threshold |
0.5 |
VAD noise filter threshold (0–1)—higher values are stricter and filter more background noise. |
vad_silence_threshold_secs |
1.5 |
Seconds of silence required before VAD triggers a commit event. |
min_speech_duration_ms |
100 |
Ignore speech segments shorter than this duration (milliseconds). |
min_silence_duration_ms |
100 |
Minimum silence gap (milliseconds) required between detected speech segments. |
flush_on_sentence_boundary |
false |
When true, dispatch text at sentence boundaries (.?!;) instead of waiting for a full timeout. |
min_chars_before_dispatch |
400 |
Minimum character count before a chunk is dispatched for translation—prevents tiny fragments. |
API Endpoints
GET /admin/stt-timing
Retrieve current STT timing settings.
curl -X GET http://localhost:3001/admin/stt-timing \
-H "Cookie: auth_token=YOUR_JWT_TOKEN"
{
"settings": {
"commit_merge_ms": 2500,
"stability_timeout_ms": 3000,
"tts_segment_pause_ms": 0,
"max_accumulation_ms": 30000,
"vad_threshold": 0.5,
"vad_silence_threshold_secs": 1.5,
"min_speech_duration_ms": 100,
"min_silence_duration_ms": 100,
"flush_on_sentence_boundary": false,
"min_chars_before_dispatch": 400
}
}
POST /admin/stt-timing
Update one or more STT timing settings. Send only the fields you wish to change—unspecified fields retain their current values.
curl -X POST http://localhost:3001/admin/stt-timing \
-H "Cookie: auth_token=YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"commit_merge_ms": 2000,
"vad_threshold": 0.6,
"min_chars_before_dispatch": 300
}'
{
"settings": {
"commit_merge_ms": 2000,
"stability_timeout_ms": 3000,
"tts_segment_pause_ms": 0,
"max_accumulation_ms": 30000,
"vad_threshold": 0.6,
"vad_silence_threshold_secs": 1.5,
"min_speech_duration_ms": 100,
"min_silence_duration_ms": 100,
"flush_on_sentence_boundary": false,
"min_chars_before_dispatch": 300
}
}
Tuning Guide
- For snappier response: Reduce
commit_merge_ms (e.g., 1000–1500) and max_accumulation_ms (e.g., 10000–15000), lower min_chars_before_dispatch (e.g., 100–200).
- For better sentence coherence: Enable
flush_on_sentence_boundary so dispatches wait for natural pauses at punctuation rather than timeouts.
- To reduce background noise: Increase
vad_threshold (e.g., 0.6–0.8) and vad_silence_threshold_secs (e.g., 2.0–2.5).
- For continuous speech (sermons, lectures): Increase
max_accumulation_ms (e.g., 40000–60000) to allow longer pauses between dispatches without losing content.
- To prevent tiny fragment translations: Increase
min_chars_before_dispatch (e.g., 500–800) so the buffer only flushes when enough words have accumulated.
- Stability timeout fallback:
stability_timeout_ms is only used if VAD commits are infrequent; it ensures translation happens even during continuous speech where the speaker never pauses long enough for VAD.
- Frontend TTS timing:
tts_segment_pause_ms controls the pause between translated audio clips on the client—increase for more natural pacing between sentences.
- Real-time video calls: Use shorter timeouts (e.g.,
commit_merge_ms: 500, stability_timeout_ms: 1000) for lower latency, with a separate /admin/video-settings endpoint if stricter isolation is needed.
Authentication: All endpoints require JWT cookie-based admin authentication. Access requires either is_admin=true or a role with appropriate permissions. Some endpoints require specific permissions (noted below).
API Keys Management
Retrieve the status of all configured API keys (elevenlabs, anthropic, deepl, libretranslate, google).
Update one or more API keys.
Body: {
"elevenlabs"?: string,
"anthropic"?: string,
"deepl"?: string,
"libretranslate"?: string,
"google"?: string
}
Retrieve the currently configured Gemini API key.
Voice Management
Scan and retrieve all available ElevenLabs voices, logging new voices not yet in the allowed list.
Get the list of voice IDs permitted for users to select from (admin-curated pool).
Update the pool of voices available to users; broadcasts change to all connected clients.
Body: {
"voiceIds": string[]
}
Feature Flags
Retrieve all feature flags (merged from YAML config & Redis overrides).
Get the value of a specific feature flag by name.
Set a feature flag value and broadcast update to all connected clients.
Body: {
"value": boolean
}
TTS & STT Settings
Retrieve current TTS settings (stability, similarity boost, style, speed, speaker boost).
Update TTS settings (stability, similarity_boost, style, speed, use_speaker_boost).
Body: Partial<TtsSettings> {
"stability"?: number,
"similarity_boost"?: number,
"style"?: number,
"speed"?: number,
"use_speaker_boost"?: boolean
}
Retrieve STT timing settings (VAD parameters, commit merge delay, stability timeout, accumulation threshold).
Update STT timing settings including VAD thresholds, silence duration, and dispatch thresholds.
Body: Partial<SttTimingSettings> {
"commit_merge_ms"?: number,
"stability_timeout_ms"?: number,
"tts_segment_pause_ms"?: number,
"max_accumulation_ms"?: number,
"vad_threshold"?: number,
"vad_silence_threshold_secs"?: number,
"min_speech_duration_ms"?: number,
"min_silence_duration_ms"?: number,
"flush_on_sentence_boundary"?: boolean,
"min_chars_before_dispatch"?: number
}
Retrieve video call settings (stability, commit merge delay, translation provider).
Update video call settings (stability_ms, commit_merge_ms, translation_provider).
Body: Partial<VideoCallSettings> {
"stability_ms"?: number,
"commit_merge_ms"?: number,
"translation_provider"?: "libretranslate" | "claude" | "deepl" | "google"
}
Languages
Retrieve the current active language pair (source & target language codes).
Set the active language pair; broadcasts change to all connected clients in real-time.
Body: {
"languages": [string, string] // exactly 2 language codes
}
Retrieve the pool of languages available for users to select from.
Update the language pool; broadcasts both pool & active pair to all clients.
Body: {
"languages": string[]
}
Translation Provider
Retrieve the active translation provider and list of available providers (google, deepl, claude, libretranslate).
Set the active translation provider; must be one of: google, deepl, claude, libretranslate.
Body: {
"provider": "google" | "deepl" | "claude" | "libretranslate"
}
Retrieve the currently selected Claude model for translation & available Claude models.
Set the Claude translation model (must be a valid model ID from CLAUDE_MODELS).
Body: {
"model": string
}
Audio Device
Retrieve the admin-selected audio input device (deviceId & label) that overrides viewer local selection.
Set the admin-forced audio device; broadcasts to all viewers to use this device.
Body: {
"deviceId"?: string,
"label"?: string
}
Content Generation
Generate a biblical sermon snippet via Gemini Flash (configurable language & sentence count).
Body: {
"apiKey"?: string,
"language"?: "ru" | "uk" | "en",
"sentences"?: number
}
Generate audio preview of text using specified voice (returns base64-encoded MP3).
Body: {
"text": string,
"voiceId"?: string
}
Voice Training & Cloning
Clone a voice from browser mic recordings (base64-encoded audio blobs); requires voice name & at least one audio clip.
Body: {
"name": string,
"clips": string[],
"mimeType"?: string
}
Clone a voice from YouTube URL (yt-dlp & ffmpeg extract N×30s clips then upload to ElevenLabs).
Body: {
"name": string,
"youtubeUrl": string,
"clipCount"?: number,
"startOffset"?: number
}
Monitoring & Logs
Retrieve hallucination detection statistics (count by reason).
Clear the hallucination detection log.
Retrieve recent translation log entries (original, translated, language, timing).
Clear the translation log.
Retrieve current broadcast TTS queue depth snapshot for monitoring.
Session & Broadcast History
Retrieve all broadcast session history from PostgreSQL with metadata (started_at, ended_at, source, voiceId).
Retrieve detailed transcript data for a specific session (all transcripts with timings & languages).
Export a session transcript in CSV, TXT, or JSON format (format query param: csv | txt | json).
User Management
Requires: user_management permission. Retrieve all users (password hashes stripped).
Requires: user_management permission. Update a user's admin status and/or assigned roles.
Body: {
"isAdmin"?: boolean,
"roleId"?: string | null,
"roleIds"?: string[]
}
Requires: user_management permission. Reset a user's password (minimum 6 characters).
Body: {
"password": string
}
Requires: user_management permission. Delete a user (cannot delete your own account).
Roles & Permissions
Requires: user_management permission. Retrieve the list of all available permissions.
Requires: user_management permission. Retrieve all defined roles with their permissions.
Requires: user_management permission. Create a new role with specified permissions.
Body: {
"name": string,
"permissions": Permission[]
}
Requires: user_management permission. Update an existing role's name and permissions.
Body: {
"name": string,
"permissions": Permission[]
}
Requires: user_management permission. Delete a role.
Socket.IO Events
Server → Client Events
| Event |
Payload |
Description |
feature_flags |
{ [flagName: string]: boolean } |
Merged feature flags from YAML defaults & Redis overrides; emitted on connection. |
languages |
{ languages: string[] } |
Current active language pair (source & target); emitted on connection & when admin updates. |
available_languages |
{ languages: string[] } |
Pool of languages available for viewer selection; emitted on connection & when admin updates. |
available_voices |
{ voiceIds: string[] } |
Admin-allowed voice IDs; emitted when admin updates available voices. |
stt_timing |
{ tts_segment_pause_ms: number } |
Speech-to-text timing settings for frontend rendering; emitted on connection. |
broadcast_status |
{ active: boolean; source?: 'mic' | 'youtube' | 'biblical' | 'remote'; pauseReason?: 'prayer' | 'song' | null } |
Broadcast on/off status & current source; emitted on connection & when broadcast starts/stops/pauses. |
broadcast_viewer_count |
{ count: number } |
Number of clients in broadcast viewer room; emitted on connection & when viewers join/leave. |
remote_audio_sources |
{ sources: Array<{ socketId: string; label: string; deviceId: string }> } |
List of registered remote audio sources; emitted on connection & when sources register/unregister. |
admin_audio_device |
{ deviceId: string; label: string } |
Admin-selected audio input device overriding viewer’s local selection; emitted on connection & when admin updates. |
transcript |
{ text: string; isFinal: boolean } |
Speech recognition text from private session (partial or final). |
broadcast_transcript |
{ text: string; isFinal: boolean; skipped?: boolean } |
Speech recognition text from broadcast session to all viewers; skipped=true if hallucination detected. |
translation |
{ original: string; translated: string; detectedLanguage?: string } |
Translated text from private session translation pipeline. |
broadcast_translation |
{ original: string; translated: string; detectedLanguage?: string } |
Translated text from broadcast translation pipeline to all viewers. |
tts_audio |
{ audio: string } |
Base64-encoded MP3 audio chunk from private session TTS worker pool. |
broadcast_tts_audio |
{ audio: string } |
Base64-encoded MP3 audio chunk from broadcast TTS worker pool to all viewers. |
tts_clear_queue |
{} |
Signal to clear all pending TTS audio from playback queue (e.g., on broadcast pause). |
audio_level |
{ data: number[] } |
Downsampled waveform data (64-point array, 0–1 range) for live level meter display. |
stream_ended |
{} |
Audio stream ended (YouTube playback complete, Biblical simulation done, or admin stopped broadcast). |
session_started |
{ source: 'mic' | 'youtube' } |
Private session successfully started. |
session_stopped |
{} |
Private session stopped by user or error. |
admin_translate_result |
{ original: string; translated: string; detectedLanguage?: string; audio: string } |
Result of admin instant translation test with base64-encoded MP3 audio. |
error |
{ message: string } |
Error message from speech recognition, TTS, translation, or stream handlers. |
Client → Server Events
| Event |
Payload |
Description |
join_broadcast |
{} |
Join the broadcast viewer room to receive live translation & audio. |
leave_broadcast |
{} |
Leave the broadcast viewer room. |
set_languages |
{ languages: string[] } |
Viewer requests language pair change (must have exactly 2 codes from available pool). |
start_session |
{ source: 'mic' | 'youtube'; voiceId?: string; youtubeUrl?: string } |
User starts private translation session from mic or YouTube URL. |
stop_session |
{} |
User stops their private session. |
change_voice |
{ voiceId: string } |
Live voice change during broadcast or private session without restarting. |
admin_start_broadcast |
{ voiceId?: string; source: 'mic' | 'youtube' | 'remote'; youtubeUrl?: string } |
Admin starts global broadcast from mic, YouTube, or remote audio sources. |
admin_stop_broadcast |
{} |
Admin stops the active broadcast. |
broadcast_pause |
{ reason: 'prayer' | 'song' } |
Admin pauses broadcast translation (TTS queue cleared but transcription continues). |
broadcast_resume |
{} |
Admin resumes broadcast translation after a pause. |
register_audio_source |
{ label: string; deviceId: string } |
Remote audio source registers itself for broadcast consumption. |
unregister_audio_source |
{} |
Remote audio source unregisters itself. |
audio_chunk |
{ audio: string } |
Base64-encoded PCM audio chunk; routed to broadcast (if admin/remote source) or private session. |
test_audio_chunk |
{ audio: string } |
Base64-encoded PCM audio chunk for testing (always routed to private session, never broadcast). |
admin_translate_test |
{ text: string; voiceId?: string; sourceLang?: string; targetLang?: string } |
Admin instant translation & TTS test (result emitted back as admin_translate_result). |
start_biblical_sim |
{ anthropicApiKey?: string; geminiApiKey?: string; language: string; voiceId?: string } |
Admin starts biblical text simulator broadcast with specified language & voice. |
stop_biblical_sim |
{} |
Admin stops the active biblical simulation. |
SDK
Uses the official @elevenlabs/elevenlabs-js SDK (v2). The client is lazy-loaded on first use.
Speech-to-Text (Scribe v2 Realtime)
Connects via native WebSocket to wss://api.elevenlabs.io/v1/speech-to-text/realtime. Handles:
- VAD-based commit buffering with configurable merge window
- Stability timeout fallback for stalled VAD
- Text validation (EN/RU/UK character regex filtering)
- Partial and final transcript emission
Text-to-Speech
Uses client.textToSpeech.stream() with the eleven_multilingual_v2 model. Audio is collected into a Buffer and emitted as base64 MP3.
Voice Management
client.voices.getAll() — fetches all voices from account
- Admin can filter which voices are available to viewers
- Voice cloning via IVC API (from recordings or YouTube)
Key File
backend/src/services/elevenlabs.service.ts
Provider Details
Google Translate
Google Cloud Translation API v2. Fast (~200ms), deterministic, and reliable. Requires GOOGLE_TRANSLATE_API_KEY with the Cloud Translation API enabled in Google Cloud Console. Ensure the API key has no HTTP referrer restrictions (server-side requests have no referrer).
File: backend/src/services/google-translate.service.ts
LibreTranslate
Self-hosted in Docker. No API key required by default. Provides language detection and translation via REST API.
File: backend/src/services/libretranslate.service.ts
DeepL
Premium translation API. Auto-detects free vs. paid endpoint based on the API key format.
File: backend/src/services/deepl.service.ts
Claude (Anthropic)
AI-powered translation using claude-haiku-4-5 for speed. Includes language detection and auto-flip logic.
File: backend/src/services/claude-translate.service.ts
Routing
Provider routing is handled by backend/src/services/translation.provider.ts:
- Try admin-selected primary provider
- On failure, try configured fallback provider
- LibreTranslate is always the last-resort fallback
Connection
Uses ioredis with automatic retry strategy. Falls back to in-memory/YAML defaults if Redis is unavailable.
Key Patterns
| Pattern | Example | Purpose |
flag:<name> |
flag:youtube_input |
Feature flag boolean values |
setting:<name> |
setting:tts_settings |
JSON settings objects |
Key File
backend/src/services/redis.service.ts
Local Development
Use docker-compose.local.yml for Redis and LibreTranslate only (backend/frontend run natively):
docker compose -f docker-compose.local.yml up -d
Production
Use docker-compose.yml for all services:
docker compose up -d --build
Services
| Service | Image | Port | Notes |
| frontend |
node:24-alpine + Nginx |
80 (exposed) |
Serves React build, proxies API/WS to backend |
| backend |
node:24-alpine |
3001 (internal) |
Express + Socket.io server |
| redis |
redis:7-alpine |
6379 (internal) |
Feature flags and settings store |
| libretranslate |
libretranslate/libretranslate |
5000 (internal) |
Self-hosted translation engine |
Configuration
ELEVENLABS_API_KEY=sk-your-production-key
ADMIN_PASSWORD=strong-secure-password
FRONTEND_URL=https://translate.example.com
APP_ENV=prod
REDIS_PASSWORD=redis-secret
Deploy
docker compose up -d --build
Reverse Proxy
When running behind Nginx or another reverse proxy:
- Set
LISTEN_PORT in .env (e.g., 8080)
- Proxy pass to
localhost:8080
- Important: Ensure WebSocket upgrades are forwarded for the
/socket.io/ path
server {
listen 443 ssl;
server_name translate.example.com;
location / {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
}
}
Monitoring
# Check all services
docker compose ps
# View backend logs
docker compose logs -f backend
# Health check
curl http://localhost:3001/api/health
Shipped
v0.1 – v0.2 — Core Translation Engine
- Real-time STT via ElevenLabs Scribe v2 Realtime
- Multi-provider translation (LibreTranslate, DeepL, Claude)
- TTS voice synthesis with ElevenLabs
- Microphone and YouTube live input
- Admin panel with feature flags, voice management, TTS tuning
- Biblical Transcript Simulator for pipeline testing
- Instant Voice Cloning from recordings and YouTube
Shipped
v0.3 — Audio Mixer & Device Selection
Browser-side audio device scanning with support for professional mixing consoles, virtual audio devices, and audio interfaces.
- Browser-side device enumeration with permission flow
- Virtual device detection (Loopback, BlackHole, VB-Audio, Voicemeeter, OBS)
- Categorized device picker (Microphones vs Mixers / Virtual Devices)
- Admin device override broadcast to all viewers via Socket.io
- Real-time feature flag broadcasting
Shipped
v0.7 — Broadcast Service
The /translate route is now a true broadcast service. Admins start one global translation session from the admin panel and all connected viewers receive the live output simultaneously.
- Single global broadcast session (one-to-many)
- Admin "Broadcast Control" panel — Start/Stop with source + voice selection
- Microphone and YouTube source both supported for broadcast
- All translation output (transcript, translated text, TTS audio)
io.emit’d to every viewer
- Viewer shows Waiting for broadcast to start… status when off air
- "On Air" / "Off Air" status pill visible to viewers in real-time
- Broadcast ownership tracked by admin socket ID; auto-stops on admin disconnect
- Biblical Transcript Simulator also broadcasts to all viewers
Shipped
v0.8 — Navigation, Broadcast FF & Transcript UX
Global persistent bottom navigation, feature-flag-gated route visibility, and a refined transcript reading experience.
- Persistent bottom navigation bar on all pages (
/translate, /broadcast, /video, /admin)
- FF-gated nav links — Broadcast and Video Call entries only appear when their flags are enabled
- No extra socket connection — nav reads flags from the page’s existing
useSocket call via props
- Nav renders a frosted dark background gradient so it never overlaps content
/broadcast route is now public (no login required); gated inside the page by the broadcast feature flag
broadcast feature flag added to YAML, backend config, and frontend FeatureFlags interface
- Transcript panel: newest translation is always at the top; older lines scroll down and fade out at the bottom
- Each new transcript entry animates in from above (
transcriptIn keyframe)
- Removed duplicate “Video Call” button from
/translate and /broadcast header bars
Shipped
v0.9 — Translation Pipeline Overhaul & Google Integration
Major improvements to translation chunking, provider support, and admin tooling.
- Google Translate as primary translation provider with automatic fallback chain
- Google Gemini 2.5 Flash for biblical simulator and sermon generation (replaces deprecated Gemini 2.0 Flash)
- Overhauled STT chunking: disabled aggressive sentence-boundary splitting, stability timer defers to accumulation during continuous speech, commit buffer defers when speaker has resumed
- Configurable sermon length (1–20 sentences) in admin UI
- Voice training: AI-generated reading text (Gemini) for mic recording sessions
- Voice training: preview playback of cloned voice after training via TTS
- Broadcast mute/unmute toggle (muted by default, replaces “Tap to enable audio” banner)
- Audio device auto-scan on page load with spinning refresh indicator
- Fixed admin Raw Server Logs auto-scroll toggle re-enabling on new messages
- Updated Claude model list: removed deprecated models, default is
claude-haiku-4-5
- Docker images upgraded to Node.js 24 (Alpine)
Up Next
v0.4 — Direct Audio Interface Feed
Accept audio directly from professional mixing consoles and audio interfaces — bypass browser mic capture entirely for broadcast-quality input.
- Direct audio interface input (ASIO / Core Audio / ALSA)
- Multi-channel mixer feed support
- Low-latency audio routing (sub-100ms)
- Hardware device auto-discovery and selection
- Professional broadcast integration (NDI, Dante)
Shipped
v0.5 — Video Call Translation
WebRTC peer-to-peer video calls with real-time bidirectional translation. Two people speak different languages and hear each other translated via TTS.
- Built-in WebRTC video call with room codes
- Full-duplex translation (each person hears the other translated)
- Per-participant STT pipeline with independent Scribe sessions
- Video grid UI with local PiP and remote full-screen
- Mic/video mute controls, hang up, auto-cleanup on disconnect
- Feature-flagged behind
video_translation
Shipped
v0.6 — Auth, Mobile & Voice Cloning in /video
- User-facing login page (
/) with JWT cookie sessions (30-day sticky, HttpOnly)
- All app routes protected — redirect to login if unauthenticated
- Live translator moved to
/translate
- Mobile-responsive UI across Translator, Admin, and Video Call views
- FaceTime-style full-screen in-call layout on mobile with safe-area insets
- “Clone Voice” button in
/video lobby, gated by video_voice_cloning feature flag
- Voice cloning modal with mic recording or YouTube URL, admin-password gated
Planned
Future
- Additional language pairs beyond EN/RU/UK
- Speaker diarization (multi-speaker detection)
- Translation memory and glossary support
- Webhooks and API for third-party integrations
- Multi-tenant deployment with user accounts
|