Skip to main content

What the Voice Stack Does

The Voice Stack gives your agent voice. It handles everything between the user’s audio and your agent’s text logic. Your agent receives plain text. It returns plain text. The Voice Stack handles everything else: transcription, synthesis, VAD, barge-in detection, endpointing, and transport.

Two Ways Users Connect

Phone / Telephony

Users dial a phone number. Unpod handles SIP, PSTN, number provisioning, and routing. No carrier account or SIP trunk needed.

Browser / App

Users connect from a web app or mobile app via the Unpod Web SDK. Same speech pipeline - no phone number required.
Both paths run through the same managed speech pipeline (STT, TTS, VAD, barge-in). Your agent code is identical regardless of how the user connects.

How Audio Flows

Unpod voice stack diagram showing phone, browser, and mobile entrypoints flowing through the managed speech layer into your AgentRunner and dialog machine.
1

User connects

Via a phone number (PSTN/SIP) or directly from a browser/app using the Web SDK.
2

Speech pipeline runs

Unpod transcribes audio (STT), detects turn end (VAD + endpointing), and handles barge-in interruptions. Your agent gets clean text.
3

Your agent responds

The Unpod orchestrator dispatches the session to your AgentRunner. Your entrypoint runs, your dialog machine produces a text reply.
4

Reply synthesised

Unpod converts the reply to speech (TTS) and streams it back to the user.
5

Session ends

Transcript, metrics, and recording are stored and queryable via the management API.

Core Building Blocks

Voice Profiles

Choose STT and TTS providers. Pre-built profiles or custom combinations with automatic failover.

Speech Pipe

Bundle a voice profile, recording settings, and connection attachments, then point the pipe at your agent.

Phone Numbers

Provision numbers directly or bring your own. Attach to a Speech Pipe for inbound calls.

SDK Setup

Install unpod, run your AgentRunner, and accept sessions from any source.

Quickstart Paths

I want to…Start here
Handle phone callsTelephony Quickstart - provision a number and attach it to your Speech Pipe
Add voice to a web appWeb SDK Quickstart - embed the Unpod Web SDK in your frontend
Connect bothStart with telephony, then add the Web SDK - same Speech Pipe, two entry points

Next Steps

Quickstart

End-to-end: agent running and accepting sessions in under 10 minutes.

SuperDialog Integration

Drive conversations with structured flows and tools.