Documentation
Video Call
Upgrade using advanced features
Distincitve features
AI-based noise reduction in real-world scenarios

AI-based noise reduction in real-world scenarios

Last updated：2024-09-09 18:34

Scene-based AI noise reduction refers to the real-time automatic recognition of different scenes and intelligent adjustment of AI noise reduction strategies to provide the best noise reduction and audio quality effects. Currently, two common noise reduction scenes are supported:

In the call scene, all sounds except human voices are recognized as noise and eliminated. Based on the elimination of steady-state noise (please refer to Audio 3A processing for details), non-steady-state noise is effectively eliminated to achieve high-fidelity human voice, including noise such as mouse, keyboard, tapping, air conditioning, kitchen utensils, noisy restaurants, environmental noise, coughing, blowing, and non-human voice noise, as well as reverberation of voices in small rooms.
In the music scene, the noise reduction effect is automatically adjusted to restore the music sound quality. Real-time music detection is performed on the microphone input, and in the sound card, singing, or near-field music scenes, the noise reduction level is automatically adjusted to ensure high-fidelity music sound quality.

Before using the AI noise reduction function, please contact ZEGOCLOUD technical support for special packaging.
Starting from version 3.0.0, ZEGO Express SDK supports intelligent recognition of music scenes. In the music scene, AI noise reduction can automatically reduce the noise reduction level and improve the audio quality experience. If you need to use this function, please contact ZEGOCLOUD technical support for special packaging and configuration.

Advantages

80% of the noise can be eliminated.
Low latency.
Low memory usage, similar to traditional noise reduction.
Low CPU usage.
Music scene recognition accuracy reaches 99%.

Use cases

This feature is suitable for 1v1 or multi-person audio and video call scenes such as voice chat rooms, meetings, and voice gaming, as well as live streaming or online KTV scenes with sound cards, singing, and near-field music.

To enable music scene recognition, please turn on the music detection switch and contact ZEGOCLOUD technical support to configure the music detection function.

Noise that can be eliminated

Developers can use this feature to eliminate the following noises:

Scene	Typical Noises
Meeting Room	Keyboard sound Table tapping sound
Office	Keyboard sound Colleague talking sound
Transportation	Car horn sound Whistling sound of cars passing by Car music sound Rain sound and windshield wiper sound
Internet Cafe	Keyboard sound Surrounding people talking sound
Coffee Shop	Chair dragging sound Surrounding people talking sound Sharp collision sound

Prerequisites

Before implementing the AI denoising feature, please make sure:

A project has been created in ZEGOCLOUD Console and applied for a valid AppID and AppSign. For details, please refer to Console - How to view project information .
ZEGO Express SDK has been integrated into the project to implement basic real-time audio and video functions. For details, please refer to Integrate the SDK and Implement a basic video call.

Steps to use

Developers can follow the following steps to configure AI noise reduction:

Please contact ZEGOCLOUD technical support to enable the music detection feature. If it is already enabled, please ignore this step.
For the specific process of initialization and logging into the room, please refer to the implementation guide in the video call documentation for "Create engine" and "Join room".
Call the enableANS interface to enable noise suppression. This feature can make the human voice clearer after it is enabled.

After enabling noise suppression, developers can call the setANSMode interface to set the ANS mode and enable the AI denoising feature. The following are some AI denoising modes, for more modes please refer to ZegoANSMode.

AI Denoising Mode	Applicable Scenarios
ZegoANSMode.AI	Lightweight mode with low power consumption and package size, still providing good denoising effects. Suitable for indoor noise environments and relatively comfortable regions in China.
ZegoANSMode.AI_BALANCED	Balanced mode that completely eliminates noise while preserving the human voice without loss. Slightly increased power consumption. Suitable for complex communication environments such as outdoor markets, transportation, and regions with severe noise interference.
ZegoANSMode.AI_LOW_LATENCY	Low latency mode that maintains pure denoising effects and high-fidelity voice quality even with a 10ms delay. Suitable for latency-sensitive scenarios such as game voice chat, game team communication, and real-time singing.

// Enable ANS
engine.enableANS(true);
// Set AI noise suppression mode according to requirements. Note: After setting ANS mode to ZegoANSMode, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
engine.setANSMode(ZegoANSMode.AI);