Vapi
Vapi is a San Francisco-based voice AI platform that lets developers build real-time, low-latency voice agents over phone, web, and SIP. It orchestrates three modular components — a transcriber (STT), an LLM, and a voice (TTS) — into a sub-700ms voice-to-voice pipeline, with first-class support for
25 channels
across 1 AsyncAPI spec
· Provider profile
Channels
-
Per-call WebSocket transport. Carries: - Binary audio frames in both directions (PCM s16le or G.711 mu-law, format set when the call is created). - JSON text frames for client-to-server controVapi Realtime API (WebSocket Transport + Server URL Events)
-
Vapi requests dynamic assistant configuration for an inbound call. Customer endpoint must respond within 7.5 seconds end-to-end with `{ assistantId }`, `{ assistant }`, `{ destination }`, or `{ errorVapi Realtime API (WebSocket Transport + Server URL Events)
-
Real-time word-level timing for assistant speech, opt-in via `serverMessages`. Variants: ElevenLabs word-alignment, Minimax word-progress, or text-only (other providers).Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Smart endpointing decision delegated to a customer server configured via `assistant.startSpeakingPlan.smartEndpointingPlan.server.url`. Customer endpoint responds with `{ timeoutSeconds }`.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
A new chat was created.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
A chat was deleted.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Emitted when the conversation history is committed.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Final summary delivered after the call ends, with recording, transcript, and messages.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
The assistant has failed to reply within the hang threshold.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Vapi asks a custom knowledge base provider for relevant documents. Customer endpoint replies with a `documents` array.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Transcriber switched its detected language.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Streamed LLM tokens or tool-call outputs, correlated by `turnId`.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Delegate hangup or forwarding to the customer server. Opt-in via `serverMessages`. `request` is either `forward` (with `destination`) or `hang-up`.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
A new session was created.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
A session was deleted.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
A session was updated.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Speech-status change (started/stopped) for assistant or user.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Call state changes — scheduled, queued, ringing, in-progress, forwarding, ended.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Function/tool invocation triggered by the assistant. Customer endpoint replies with a `results` array correlating each `toolCallId` to a result string.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Streaming partial and final transcripts from the configured transcriber. Filtered by `transcriptType` (`partial` or `final`).Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Vapi asks for a transfer destination when the assistant did not pre-specify one. Customer endpoint replies with a `destination` and optional spoken `message`.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Confirmation that a transfer executed, with the final destination.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
User interrupted the assistant — use `turnId` to discard the interrupted turn's tokens.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
Custom voice provider receives the user's transcribed input.Vapi Realtime API (WebSocket Transport + Server URL Events)
-
TTS audio request sent to a custom voice server configured via `assistant.voice.server.url`. The customer endpoint must respond with raw 1-channel 16-bit PCM audio at the requested `sampleRate` (binarVapi Realtime API (WebSocket Transport + Server URL Events)