API Documentation#
Streaming APIs#
The system provides a comprehensive streaming API for real-time audio-to-face conversion:
StreamingAudio2FaceV1: Main streaming interface for real-time audio processing
WebSocket-based communication for low-latency streaming
Chunk-based audio processing for continuous real-time response
Configurable postprocessing pipelines for different emotional profiles
Asynchronous processing with thread pool management
Request expiration and cache management for optimal performance
Request/Response Format#
The API uses Protocol Buffers for efficient serialization and supports:
Chunk-based Processing: Audio input is processed in configurable chunks for real-time response
Blendshape Output: Facial animation data represented as blendshape values
Frame-based Timeline: Precise frame-based timeline management for animation sequencing
Streaming Protocol: WebSocket-based streaming with start/body/end message types
Error Handling: Comprehensive error responses with detailed error codes and messages
Data Flow#
Audio Input: PCM audio data in configurable chunk sizes
Feature Extraction: Wav2Vec2-based audio feature extraction
Inference: ONNX Unitalker model generates blendshape predictions
Postprocessing: Configurable pipeline applies emotional profiles and effects
Output: Structured blendshape data with frame timing information