How It Works - Unpod Dev

The Model: Audio In, Text Out

Unpod sits between the caller and your agent. Your agent never touches audio.

Unpod voice stack diagram showing audio entering the managed layer, text turns reaching the agent, and audio streaming back to the user.

Your endpoint is a simple text-in / text-out webhook. Everything involving audio, carriers, and real-time streaming is handled by Unpod.

The Communication Stack

Phone Numbers

A number provisioned through Unpod (or BYON) receives the inbound call. Unpod handles SIP, PSTN connectivity, and routing. You never configure a carrier or SIP trunk.

Speech Pipeline

Audio from the caller is transcribed by the configured STT provider (Deepgram). VAD detects when the caller finishes speaking. Barge-in detection interrupts the agent if the caller speaks mid-response. Agent replies are synthesised by the TTS provider (Cartesia or ElevenLabs). The voice profile on your agent sets which providers are used - with automatic failover if a provider goes down.

Orchestration

The Unpod orchestrator receives the session and dispatches it to the least-loaded AgentRunner worker registered for that agent. Worker heartbeats, capacity tracking, and load balancing are automatic.

Your Agent

Your AgentRunner process accepts the dispatch and calls your entrypoint with a CallContext. The Session object gives you controls: say, transfer, end, pause recording. session.run() keeps the call alive and routes transcribed turns to your dialog machine.

Dialog

Transcribed text is routed to whatever dialog handler you set. Use SuperDialog for structured flows with tools and branching. Use any HTTP endpoint, LangChain chain, or custom logic - whatever your agent already is.

What You Own vs What Unpod Owns

Call lifecycle diagram showing the boundary between Unpod managed infrastructure and the server-side AgentRunner, Session, and DialogMachine that you own.

The line is clear: you own everything from AgentRunner down. Everything above it is managed.

Core Concepts

Numbers

Phone numbers attached to your Speech Pipe. Inbound calls arrive on a number and are routed to the Speech Pipe it is attached to. Numbers come from Unpod directly or via BYON. See Voice Stack - Numbers.

Voice Profiles

A bundle of STT + TTS provider configuration. Defines which providers handle speech recognition and synthesis, which language, and the failover order. See Voice Stack - Voice Profiles.

Speech Pipe

A configuration entity: name, voice profile, recording settings, call duration limits. The Speech Pipe is the anchor point for numbers, voice profiles, and runner workers. See Voice Stack - Speech Pipe.

AgentRunner

A long-lived Python process that registers with the Unpod orchestrator over WebSocket. It advertises capacity, receives dispatch frames, and calls your entrypoint for each call. See Voice Stack - SDK Setup.

CallContext

The object passed to your entrypoint for every call. Contains call_id, session_id, direction, user_number, instructions, data, and the session object. Read-only metadata plus the live session handle.

Session

Your control interface for a live call. say(), transfer_to_human(), end(), recording.pause(), hooks, metrics, and the dialog_machine property. See Voice Stack - Session Controls.

SuperDialog

The conversation engine. A DialogMachine executes a structured Flow turn-by-turn, calls tools, and manages state. Plugs into any Session via session.dialog_machine = machine. See SuperDialog.

A Complete Call, End to End

Caller dials +1 415 555 0100
Unpod routes to pipe "pipe_support"
Orchestrator dispatches to your AgentRunner
AgentRunner calls handle_call(ctx)
You set ctx.session.dialog_machine = DialogMachine(flow, llm)
session.run() starts the loop
Caller speaks -> Deepgram transcribes -> DialogMachine.turn()
Reply text -> Cartesia synthesises -> Caller hears
Call ends -> transcript + metrics stored -> hooks fire

Your code runs from step 4 onward. Steps 1-3 are Unpod.

Next Steps

Voice Stack

Numbers, voice profiles, agents, and the AgentRunner SDK.

SuperDialog

Build structured conversation flows with tools and branching.

​The Model: Audio In, Text Out

​The Communication Stack

​What You Own vs What Unpod Owns

​Core Concepts

​Numbers

​Voice Profiles

​Speech Pipe

​AgentRunner

​CallContext

​Session

​SuperDialog

​A Complete Call, End to End

​Next Steps

Voice Stack

SuperDialog

The Model: Audio In, Text Out

The Communication Stack

What You Own vs What Unpod Owns

Core Concepts

Numbers

Voice Profiles

Speech Pipe

AgentRunner

CallContext

Session

SuperDialog

A Complete Call, End to End

Next Steps