API Documentation

Contents

API Documentation#

Streaming APIs#

The system provides a comprehensive streaming API for real-time audio-to-face conversion:

StreamingAudio2FaceV1: Main streaming interface for real-time audio processing
- WebSocket-based communication for low-latency streaming
- Chunk-based audio processing for continuous real-time response
- Configurable postprocessing pipelines for different emotional profiles
- Asynchronous processing with thread pool management
- Request expiration and cache management for optimal performance

Request/Response Format#

The API uses Protocol Buffers for efficient serialization and supports:

Chunk-based Processing: Audio input is processed in configurable chunks for real-time response
Blendshape Output: Facial animation data represented as blendshape values
Frame-based Timeline: Precise frame-based timeline management for animation sequencing
Streaming Protocol: WebSocket-based streaming with start/body/end message types
Error Handling: Comprehensive error responses with detailed error codes and messages

Data Flow#

Audio Input: PCM audio data in configurable chunk sizes
Feature Extraction: Wav2Vec2-based audio feature extraction
Inference: ONNX Unitalker model generates blendshape predictions
Postprocessing: Configurable pipeline applies emotional profiles and effects
Output: Structured blendshape data with frame timing information