Documentation
Digital Human PaaS
Introduction
Overview

Overview

Last updated：2024-02-19 14:54

Product Introduction

Digital Human, a Platform as a Service (PaaS) service based on AI capabilities, provides cloud APIs for developers to generate video files or streams in various scenarios such as short video production and live streaming.

To create and publish a live stream, use this service in combination with

Video Call.

Benefits

Fast integration

Developers can call server APIs to quickly implement features without the need for server deployment and O&M. This significantly reduces development costs and facilitates product launch.

Diverse configurations

A diversity of parameters can be configured in APIs, including format, resolution, timbre, and image, to flexibly meet the requirements of various scenarios.

Flexible content creation

Video files generated in asynchronous mode and real-time audio and video streams are supported for short video or live streaming scenarios.

Application Scenarios

Scenario	Description
Short video production	The server APIs provided by this service contain configurations such as background, image, and timbre, which are required for the generation of short videos. Different video formats and resolutions are available. Asynchronous generation of video files is also supported.
Live streaming	Developers can call server APIs to create a video stream task, use text or audio to drive the Digital Human model, and then publish the real-time video stream to the Real-time Audio and Video service provided by ZEGOCLOUD. The stream can be played on a client to display the streaming content.
Interaction	Developers can call server APIs to create a video stream task and, after obtaining end-users’ questions, use text or audio to drive the Digital Human model to generate replies. Then, publish the real-time video stream to Real-time Audio and Video so that the stream can be played on a client to display the replies of the Digital Human model.

Features

Feature	Description
Digital Human asset query	Developers can call server APIs to query the information about available public and customized Digital Human models, such as their image and timbre.
Video production in asynchronous mode or in real time	Both the generation of video files and the generation of real-time audio or video streams are supported.
Different streaming media formats and resolutions	Supported encapsulation formats are MP4 and WebM (with alpha channels supported). Supported video resolutions are 1080P and 2K.
Speech synthesis	Text-to-speech and SSML are supported.
Large language model	The large language model can generate replies based on the questions asked.