What the Voice Stack Does
The Voice Stack gives your agent voice. It handles everything between the user’s audio and your agent’s text logic. Your agent receives plain text. It returns plain text. The Voice Stack handles everything else: transcription, synthesis, VAD, barge-in detection, endpointing, and transport.Two Ways Users Connect
Phone / Telephony
Users dial a phone number. Unpod handles SIP, PSTN, number provisioning, and routing. No carrier account or SIP trunk needed.
Browser / App
Users connect from a web app or mobile app via the Unpod Web SDK. Same speech pipeline - no phone number required.
How Audio Flows
Speech pipeline runs
Unpod transcribes audio (STT), detects turn end (VAD + endpointing), and handles barge-in interruptions. Your agent gets clean text.
Your agent responds
The Unpod orchestrator dispatches the session to your
AgentRunner. Your entrypoint runs, your dialog machine produces a text reply.Core Building Blocks
Voice Profiles
Choose STT and TTS providers. Pre-built profiles or custom combinations with automatic failover.
Speech Pipe
Bundle a voice profile, recording settings, and connection attachments, then point the pipe at your agent.
Phone Numbers
Provision numbers directly or bring your own. Attach to a Speech Pipe for inbound calls.
SDK Setup
Install
unpod, run your AgentRunner, and accept sessions from any source.Quickstart Paths
| I want to… | Start here |
|---|---|
| Handle phone calls | Telephony Quickstart - provision a number and attach it to your Speech Pipe |
| Add voice to a web app | Web SDK Quickstart - embed the Unpod Web SDK in your frontend |
| Connect both | Start with telephony, then add the Web SDK - same Speech Pipe, two entry points |
Next Steps
Quickstart
End-to-end: agent running and accepting sessions in under 10 minutes.
SuperDialog Integration
Drive conversations with structured flows and tools.