Realtime voice (WebSocket)

Server: python standalone_voice_call_agent.py --mode web. Mic audio is streamed as 16 kHz PCM to Azure STT; replies play as PCM or via Azure real-time talking avatar (WebRTC) when STANDALONE_USE_AVATAR=1. Customer view (captions + wave or avatar, mobile-friendly).

Leave empty to let the server create a session when DEFAULT_CHATBOT_ID is set (new UUID is returned in the WebSocket config message and filled here). Otherwise paste a UUID for DB-backed prompts and chatbot_conversation logging. STANDALONE_ALLOW_ANONYMOUS_WS=1 allows connecting without an ID (local dev only).

Speech recognition and assistant reply are both locked to this language for the session.

Browsers hide microphone names until you allow access once. Click Refresh list and accept the prompt to load every connected microphone, then pick one before Connect.

Used only when server runs with --tts elevenlabs. Ignored for Azure TTS.

Voice uses server PCM playback. Avatar requires server-side STANDALONE_USE_AVATAR=1.

ElevenLabs model selection is used only when TTS engine = ElevenLabs.

Loaded from DB table assets_ttsvoice based on selected provider.

Disconnected

Session log


      

Same lines are mirrored to the browser console (F12).

Live STT

Assistant