ZEGO Avatar
  • iOS : Objective-C
  • Android
  • Overview
  • Client SDKs
  • Demo app
  • Getting started
    • Integrate the SDK
    • Create a virtual avatar
    • ZegoCharacterHelper instructions
  • Guides
  • Best practice
  • Error codes
  • API Documents
  • FAQ

Speech simulation

Last updated:2022-06-29 12:12


The ZegoAvatar SDK provides the speech simulation feature. Based on the sound waves of voice, this feature drives a virtual avatar to change its mouth shapes in real time. In this case, a virtual avatar can express its emotions like a real person. This feature can be widely used in social interaction and voice chat live streaming scenarios.


Before you implement the speech simulation feature, ensure that the following conditions are met:

Implementation steps

Refer to the following procedures to implement the speech simulation feature.

1. Start voice detection

  • Before you start the voice detection, ensure that the microphone permission is enabled.
  • If the ZegoCharacterHelper class is used, no APIs related to IZegoCharacter need to be called.

After the basic virtual avatar is created, call the startDetectExpression API, set the drive mode to ZegoExpressionDetectModeAudio, and use the microphone to detect sound waves. Then, call the setExpression API of ZegoCharacterHelper to set the facial expressions and drive mouth shape changes of the virtual avatar.

// Start voice detection.
___weak typeof(self) weakSelf = self;
BOOL ret = [[[ZegoAvatarService sharedInstance] getInteractEngine] startDetectExpression:ZegoExpressionDetectModeAudio callback:^(ZegoExpression *expression) {
    // Drive mouth shape changes of the virtual avatar.
    __strong typeof(self) strongSelf = weakSelf;
    [strongSelf.characterHelper setExpression: expression];

2. Customize audio collection

You can call the setCustomAudioDelegate API to customize the audio data collection agent. (You need to inherit AudioDataDelegate to implement the onStart and onStop methods.) After audio data is collected, call the sendAudioData API to send data.

@interface ExpressAudioCaptureDelegate()<ZegoEventHandler, ZegoCustomAudioProcessHandler>
    BOOL _isRunning;
@implementation ExpressAudioCaptureDelegate
- (void)onStart{
    // Start audio collection.
    _isRunning = YES;
- (void)onStop{
    // Stop audio collection.
    _isRunning = NO;
// This is the audio pre-processing callback of Express. Send data collected by Express to the ZegoAvatar SDK.
- (void)onProcessCapturedAudioData:(unsigned char * _Nonnull)data dataLength:(unsigned int)dataLength param:(ZegoAudioFrameParam *)param timestamp:(double)timestamp; {
        // data: The source PCM data.
        // length: The data length.
        // dataType: The number of data bits collected. 0 indicates 16 bits, and 1 indicates 8 bits.
        // timeStamp: The timestamp from the collection start time to the current time.
        // sendAudioData is a parent method. The data is transparently transmitted to the ZegoAvatar SDK. If the data provided by RTC is 8 bits, dataType is 1.
        [super sendAudioData: (void*)data  size:dataLength dataType: 1 /* 8-bit data from RTC*/ timeStamp: [super getDurationMs]/*This is a parent method and can be called directly.*/];

3. Stop voice detection

Run the application in the backend or exit the current page. Call the stopDetectExpression API to stop voice detection.

// Stop voice detection.
[[[ZegoAvatarService sharedInstance] getInteractEngine] stopDetectExpression];