Virtual Avatar
  • iOS
  • Android : Java
  • Overview
  • Client SDKs
  • Demo app
  • Getting started
    • Integrate the SDK
    • Create a virtual avatar
    • ZegoCharacterHelper instructions
  • Guides
  • Best practice
  • Error codes
  • Server APIs

Speech simulation

Last updated:2022-12-23 13:03


The ZegoAvatar SDK provides the speech simulation feature. Based on the sound waves of voice, this feature drives a virtual avatar to change its mouth shapes in real time. In this case, a virtual avatar can express its emotions like a real person. This feature can be widely used in social interaction and voice chat live streaming scenarios.


Before you implement the speech simulation feature, ensure that the following conditions are met:

Implementation steps

Refer to the following procedures to implement the speech simulation feature.

1. Start voice detection

  • Before you start the voice detection, ensure that the microphone permission is enabled.
  • If the ZegoCharacterHelper class is used, no APIs related to IZegoCharacter need to be called.

After the basic virtual avatar is created, call the startDetectExpression API, set the drive mode to Audio, and use the microphone to detect sound waves. Then, call the setExpression API of ZegoCharacterHelper to set facial expressions and drive mouth shape changes of the virtual avatar.

// Start voice detection.
ZegoAvatarService.getInteractEngine().startDetectExpression(ZegoExpressionDetectMode.Audio,expression -> {
    // Drive mouth shape changes of the virtual avatar.

2. Customize audio collection

You can call the setCustomAudioDelegate API to customize the audio data collection agent. (You need to inherit AudioDataDelegate to implement the onStart and onStop methods.) After audio data is collected, call the sendAudioData API to send data.

public class CustomAudioDataDelegate extends AudioDataDelegate {
    private Boolean isRunning = false;
    public void onStart() {
        // Start audio collection.
        isRunning = true;
    public void onStop() {
       // Stop audio collection.
        isRunning = false;

    // This is the audio pre-processing callback of Express. Send data collected by Express to the ZegoAvatar SDK.
    public void onProcessCapturedAudioData(ByteBuffer data, int dataLength, ZegoAudioFrameParam param) {
            // Specify the audio data.
            // data: The source PCM data.
            // length: The data length.
            // dataType: The number of data bits collected. 0 indicates 16 bits, and 1 indicates 8 bits.
            // timeStamp: The timestamp from the collection start time to the current time, in milliseconds.
            super.sendAudioData(data, dataLength, 1/* If data provided by RTC is 8 bits, enter 1.*/, super.getDurationMs()/* This is a parent method and can be called directly.*/) ;

3. Stop voice detection

Run the application in the backend or exit the current page. Call the stopDetectExpression API to stop voice detection.

// Stop voice detection.