The Model: Audio In, Text Out
Unpod sits between the caller and your agent. Your agent never touches audio.The Communication Stack
Phone Numbers
A number provisioned through Unpod (or BYON) receives the inbound call. Unpod handles SIP, PSTN connectivity, and routing. You never configure a carrier or SIP trunk.
Speech Pipeline
Audio from the caller is transcribed by the configured STT provider (Deepgram). VAD detects when the caller finishes speaking. Barge-in detection interrupts the agent if the caller speaks mid-response. Agent replies are synthesised by the TTS provider (Cartesia or ElevenLabs). The voice profile on your agent sets which providers are used - with automatic failover if a provider goes down.
Orchestration
The Unpod orchestrator receives the session and dispatches it to the least-loaded AgentRunner worker registered for that agent. Worker heartbeats, capacity tracking, and load balancing are automatic.
Your Agent
Your AgentRunner process accepts the dispatch and calls your entrypoint with a CallContext. The Session object gives you controls: say, transfer, end, pause recording. session.run() keeps the call alive and routes transcribed turns to your dialog machine.
What You Own vs What Unpod Owns
Core Concepts
Numbers
Phone numbers attached to your Speech Pipe. Inbound calls arrive on a number and are routed to the Speech Pipe it is attached to. Numbers come from Unpod directly or via BYON. See Voice Stack - Numbers.Voice Profiles
A bundle of STT + TTS provider configuration. Defines which providers handle speech recognition and synthesis, which language, and the failover order. See Voice Stack - Voice Profiles.Speech Pipe
A configuration entity: name, voice profile, recording settings, call duration limits. The Speech Pipe is the anchor point for numbers, voice profiles, and runner workers. See Voice Stack - Speech Pipe.AgentRunner
A long-lived Python process that registers with the Unpod orchestrator over WebSocket. It advertises capacity, receives dispatch frames, and calls your entrypoint for each call. See Voice Stack - SDK Setup.CallContext
The object passed to your entrypoint for every call. Containscall_id, session_id, direction, user_number, instructions, data, and the session object. Read-only metadata plus the live session handle.
Session
Your control interface for a live call.say(), transfer_to_human(), end(), recording.pause(), hooks, metrics, and the dialog_machine property. See Voice Stack - Session Controls.
SuperDialog
The conversation engine. ADialogMachine executes a structured Flow turn-by-turn, calls tools, and manages state. Plugs into any Session via session.dialog_machine = machine. See SuperDialog.
A Complete Call, End to End
Next Steps
Voice Stack
Numbers, voice profiles, agents, and the AgentRunner SDK.
SuperDialog
Build structured conversation flows with tools and branching.