🚀 Quick Start#
In this section, we first introduce the DLP3d’s technical framework. Then, we present a step-by-step tutorial on how to setup DLP3d services and setup chat.
Technical Framework#
The DLP3d system consists of 3 core components, namely Web App, Orchestrator and Backend & Cloud Services:
Web App: provides an intuitive graphical interface for customizing and interacting with 3D avatars. Each character is fully configurable — including 3D models, LLMs, prompts, and voices. Chatting with an embodied AI is as simple as holding the microphone button and speaking. Rather than relying on local assets, the system streams character audio, facial expressions, and body motions in real-time. A robust runtime animation pipeline is implemented to receive, organize, and replay streamed data, featuring mechanisms such as adaptive motion blending, connection-loss recovery, and network health estimation to ensure responsiveness and interactivity.
Orchestrator: serves as a real-time intelligent conversation engine that coordinates personalized multi-modal AI interaction workflows. It integrates modules for Automatic Speech Recognition (ASR), Large Language Models (LLMs), Text-to-Speech (TTS), emotion analysis, memory management, and 3D animation generation (Audio2Face & Speech2Motion). Designed with modularity and streaming in mind, it supports multiple AI service providers and manages the entire conversation lifecycle. Acting as the conductor, the orchestrator seamlessly synchronizes diverse AI services and algorithmic components into a unified, coherent experience.
Backend & Cloud Services: encompasses both AI services and the web backend. Users can choose preferred providers for LLM, TTS, and ASR services. The backend is responsible for asset and database management at runtime, delivering unified APIs for user authentication, avatar resource access across the DLP3d ecosystem.
Technical Framework
System Requirements#
Backend Services#
Make sure that your hardware is of AMD64 architecture and have at least 2GB RAM.
Web App#
Since the web app runs entirely on your browser, it works smoothly across different platforms and devices. We’ve verified successful runs on the following devices:
Desktop Devices:
Windows/Ubuntu: Intel i7-8550U with built in UHD Graphics 620 graphics card, 8GB RAM, 1920x1080 screen resolution
MacOS: MacBook Pro (16-inch, 2019) or later
Tip
For better gaming experience, we recommend using screen resolution >= 1920x1080.
Mobile Devices:
iOS: iPhone 12 Pro Max or later
iPadOS: iPad Pro 11-inch (2nd generation) or later
Data Preparation#
To start DLP3d web service, you need to download the ONNX model file and motion data file and set up the required directory structure.
Download ONNX Model#
Download the ONNX model file:
GitHub Download: unitalker_v0.4.0_base.onnx
Google Drive Download: unitalker_v0.4.0_base.onnx
Baidu Cloud: unitalker_v0.4.0_base.onnx
Download the motion data file:
Google Drive Download: motion_data.zip
Baidu Cloud: motion_data.zip
Organize the data:
Create a
weightsdirectory in your project root if it doesn’t existCreate a
datadirectory in your project root if it doesn’t existPlace the downloaded
unitalker_v0.4.0_base.onnxfile in theweightsdirectoryExtract the downloaded
motion_data.zipfile to thedatadirectoryEnsure the following directory structure is created:
├─docker-compose.yml
├─data
│ ├─motion_database.db
│ ├─blendshapes_meta
│ ├─joints_meta
│ ├─mesh_glb
│ ├─motion_files
│ ├─restpose_npz
│ └─rigids_meta
└─weights
└─unitalker_v0.4.0_base.onnx
Directory Structure Explanation#
docker-compose.yml: Docker Compose configuration file for running all services.data/: Directory containing motion-related data files.motion_database.db: SQLite database containing motion metadata.blendshapes_meta/: Directory for blendshapes metadata files.joints_meta/: Directory for joint metadata files.mesh_glb/: Directory for 3D mesh files in GLB format.motion_files/: Directory containing motion animation files.restpose_npz/: Directory for rest pose data in NPZ format.rigids_meta/: Directory for rigid body metadata files.
weights/: Directory for storing ONNX model files.unitalker_v0.4.0_base.onnx: The main ONNX model file for audio-to-face generation.
Deploy Local TTS Service (Optional)#
Note
This section is optional. If you prefer to use cloud-based TTS services, you can skip this section and proceed directly to Start Service.
If you want to deploy a local TTS service instead of using cloud-based TTS providers, you need to ensure that your Docker environment supports CUDA. Additionally, you need to prepare additional data and models according to the documentation provided at a fork of the official chatterbox repository.
Prerequisites#
Docker environment with CUDA support
NVIDIA GPU with appropriate drivers installed
Follow the data and model preparation steps from the chatterbox README
Adding TTS Service to Docker Compose#
After completing the additional TTS file preparation, you can add a new container to your docker-compose.yml file. The container configuration block is provided below for you to fill in:
chatterbox:
image: dockersenseyang/service_chatterbox:latest
container_name: chatterbox
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- ./weights:/workspace/chatterbox/weights
- ./data:/workspace/chatterbox/data
networks:
- dlp3d_network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:18085/health"]
interval: 10s
timeout: 5s
retries: 60
start_period: 60s
Configure Orchestrator to Use Local TTS Service#
After adding the chatterbox service, you need to modify the orchestrator container in your docker-compose.yml file to add a new environment variable and depend on the chatterbox service. Add the following content to the orchestrator service:
orchestrator:
# ... existing configuration ...
environment:
# ... existing environment variables ...
CHATTERBOX_TTS_HTTP_URL: http://chatterbox:18085
depends_on:
# ... existing dependencies ...
chatterbox:
condition: service_healthy
This environment variable tells the orchestrator where is the local chatterbox TTS service deployed by docker compose. Depend on the chatterbox service to ensure proper startup order.
Start Service#
The easiest way to get started with DLP3d web service is using the pre-built Docker image. First, make sure that you have Docker and Docker Compose correctly installed.
Prerequisites#
For Windows Users#
Install Docker Desktop which includes both Docker and Docker Compose.
For Linux Users#
Follow the official documentation to install Docker Engine. Docker Compose is included with Docker Engine installation.
After installation, Docker Compose will be available for use.
Starting the Services#
In the provided docker-compose.yml, we demonstrate how to seamlessly orchestrate all DLP3d services — including the backend, frontend, and supporting components. The DLP3D services can be started with a single command:
# cd to the project root
docker compose up
Handling SSL Certificate Warnings#
Wait until all the services are started. The application uses self-signed certificates, so you’ll need to trust them in your browser for all three services:
Step 1: Trust the Frontend Service (Port 18000)#
Open your browser and visit https://127.0.0.1:18000. The browser will show a security warning because we’re using a self-signed certificate. Click the Advanced button and then Proceed to 127.0.0.1 (unsafe):
SSL Certificate Warning
Step 2: Trust the Backend Service (Port 18001)#
Next, visit https://127.0.0.1:18001 and repeat the same process to trust this certificate.
Step 3: Trust the Orchestrator Service (Port 18002)#
Finally, visit https://127.0.0.1:18002 and trust this certificate as well.
Step 4: Start Using the Application#
Once all three certificates are trusted, you can return to https://127.0.0.1:18000 to start using the application. You should see the web interface:
Note
For LLM provider keys, SenseNova uses a single sensenova_api_key, MiniMax uses a single minimax_api_key, and SenseChat still uses sensechat_ak plus sensechat_sk. SenseNova Omni LLM adapters are deprecated and should not be selected for new characters.
Fronted Window Example
First Chat#
Setup Account#
You need to create an account and login first. Click the Account icon on the top-right corner of the window to open the authentication window.
Step 1: Register a New User#
Select the REGISTER tab in the authentication window. Enter your email address and password, then click the register button to create a new account:
User Registration
For community edition deployed on your own host, email and password are stored in the local MongoDB database.
Step 2: Login to Your Account#
After successful registration, switch to the LOGIN tab. Enter the email address and password you used during registration, then click Sign In:
User Authentication
Once you’ve successfully logged in, the UI panel will display:
Left: simulation settings, e.g., whether enable cloth simulation, debug mode.
Right: chat settings, where the character’s prompt, LLMs, TTS are configured.
Top: chat list settings, you can select your desired avatar or create new avatars here.
Bottom: entrypoint for starting a chat.
Home Page
Select Avatar#
Expand the Chat List panel by clicking it. Then, all available avatars will be listed below:
Chat List
We provide 4 avatars by default, namely Keqing-default, Furina-default, HuTao-default and Ani-default. Notice that the default characters cannot be edited directly. Instead, you should create a new avatar by duplicating existing default avatars. You can find the duplicate button in the top-right corner of each avatar card. The newly created avatar, comparatively, can be further customized. The name at the bottom of the avatar card will be used during the chat, here we rename it as Keqing-test as an example:
Duplicate Character
Prompt Settings#
To customize an avatar’s intrinsic behaviors—such as personality, abilities, and preferences—open the Prompt panel on the right side of the window. Each avatar comes with a default character prompt that you can freely edit to create your own unique personality.
Prompt Settings
Configure LLMs#
We employ LLMs to handle conversations, user behavior and relationship with avatar:
Conversation: handles text conversations, supports multiple LLMs.
Reaction & Classification: for emotional analysis, which analyzes user intent, character emotional changes, relationship changes and triggered actions.
Memory: for multi-level conversation memory.
To enable these features, you need to configure api keys for LLMs. Open the LLM panel on the right side of window, then click the key icon on the left side of the provider card and save your api key on the popped window:
API Key
The provider card and text will be highlighted once the api key is successfully saved. Notice that the backend service will not validate the user uploaded keys. An incorrect api key will break down the chat.
LLM Settings
Configure TTS#
The ASR/TTS panel is where you can setup the speech recognition model and change the voice of the avatar:
ASR(Automatic Speech Recognition): user speech recognition, supports multiple languages and real-time processing
TTS(Text To Speech): converts text to natural speech, supports multiple voices and emotional expressions
The ASR and TTS also need api keys to get working. Similarly, click the key icon to configure api keys.
Note
API keys are stored securely in your local MongoDB database. DLP3D.AI does not collect, transmit, or upload this data.
ASR/TTS Settings
Start Chat#
On the Scene panel, you can setup the background scene during the chat. Once you have completed the chat settings, click the Chat button to chat with the avatar:
Start Chat
The browser will jump to a new tab and will connect to the algorithm services to fetch necessary configurations and download runtime resources.
Note
If error occurs during the loading process, an error toast will appear on the screen. Please follow the error message to fix the issue.
Once the loading progress is finished, click the Start button to enter the chat. Notice that for the first chat, your browser may request the microphone permission. Please click the Allow button so that your speech can be normally processed.
Loading Progress
Talk with Avatar#
When entered the chat, the avatar will play opening remark and then enter the idle state. To talk with the avatar, click and hold the Microphone button, then start uploading your voice:
Avatar in Idle State
When you have finished your sentence, release the Microphone button, the web app will send your voice data to the backend services:
Tip
If the screen is lagging during chat, please consider turning off the cloth simulation from the homepage(SIMULATION -> Cloth Simulation).
Avatar in Listening State