Sage Voice

Real-time voice AI platform. Integrate voice capabilities into your applications with Twilio, WebSocket streaming, and a full STT/LLM/TTS pipeline. Base URL: https://sage-voice.devblocktechnologies.com

Overview

Sage Voice provides a complete voice AI solution that handles the entire real-time audio pipeline. From speech-to-text (STT) through language model processing to text-to-speech (TTS) output, Sage Voice manages the complexity so you can focus on building voice-enabled applications.

Twilio Integration

Connect phone numbers via Twilio for PSTN voice calls.

WebSocket Calls

Direct browser-to-server WebSocket connections for low-latency voice.

VAD

Voice Activity Detection for natural turn-taking and interruptions.

STT / LLM / TTS Pipeline

End-to-end audio processing with configurable models at each stage.

Quick Start

Connect a browser to Sage Voice via WebSocket:

const ws = new WebSocket(
  "wss://sage-voice.devblocktechnologies.com/call/browser/my-call-id"
);

ws.onopen = () => {
  console.log("Connected to Sage Voice");
  // Start sending audio chunks
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  // Handle audio responses
};

ws.onclose = () => {
  console.log("Call ended");
};
View API Reference →

Architecture

Sage Voice processes audio through a multi-stage pipeline:

1. Audio Input

Raw audio from browser microphone or Twilio telephony stream.

2. Voice Activity Detection (VAD)

Detects when the user is speaking, manages barge-in and turn-taking.

3. Speech-to-Text (STT)

Transcribes audio to text using state-of-the-art ASR models.

4. Language Model (LLM)

Processes transcribed text through configurable LLMs for intelligent responses.

5. Text-to-Speech (TTS)

Converts LLM output to natural-sounding speech streamed back to the user.