Sage Voice
Real-time voice AI platform. Integrate voice capabilities into your applications with Twilio, WebSocket streaming, and a full STT/LLM/TTS pipeline. Base URL: https://sage-voice.devblocktechnologies.com
Overview
Sage Voice provides a complete voice AI solution that handles the entire real-time audio pipeline. From speech-to-text (STT) through language model processing to text-to-speech (TTS) output, Sage Voice manages the complexity so you can focus on building voice-enabled applications.
Twilio Integration
Connect phone numbers via Twilio for PSTN voice calls.
WebSocket Calls
Direct browser-to-server WebSocket connections for low-latency voice.
VAD
Voice Activity Detection for natural turn-taking and interruptions.
STT / LLM / TTS Pipeline
End-to-end audio processing with configurable models at each stage.
Quick Start
Connect a browser to Sage Voice via WebSocket:
const ws = new WebSocket(
"wss://sage-voice.devblocktechnologies.com/call/browser/my-call-id"
);
ws.onopen = () => {
console.log("Connected to Sage Voice");
// Start sending audio chunks
};
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Handle audio responses
};
ws.onclose = () => {
console.log("Call ended");
};Architecture
Sage Voice processes audio through a multi-stage pipeline:
1. Audio Input
Raw audio from browser microphone or Twilio telephony stream.
2. Voice Activity Detection (VAD)
Detects when the user is speaking, manages barge-in and turn-taking.
3. Speech-to-Text (STT)
Transcribes audio to text using state-of-the-art ASR models.
4. Language Model (LLM)
Processes transcribed text through configurable LLMs for intelligent responses.
5. Text-to-Speech (TTS)
Converts LLM output to natural-sounding speech streamed back to the user.