提交工单
咨询集成、功能及报价等问题
Digital Human, a Platform as a Service (PaaS) service based on AI capabilities, provides cloud APIs for developers to generate video files or streams in various scenarios such as short video production and live streaming.
Developers can call server APIs to quickly implement features without the need for server deployment and O&M. This significantly reduces development costs and facilitates product launch.
A diversity of parameters can be configured in APIs, including format, resolution, timbre, and image, to flexibly meet the requirements of various scenarios.
Video files generated in asynchronous mode and real-time audio and video streams are supported for short video or live streaming scenarios.
Scenario | Description |
---|---|
Short video production |
The server APIs provided by this service contain configurations such as background, image, and timbre, which are required for the generation of short videos. Different video formats and resolutions are available. Asynchronous generation of video files is also supported. |
Live streaming |
Developers can call server APIs to create a video stream task, use text or audio to drive the Digital Human model, and then publish the real-time video stream to the Real-time Audio and Video service provided by ZEGOCLOUD. The stream can be played on a client to display the streaming content. |
Interaction |
Developers can call server APIs to create a video stream task and, after obtaining end-users’ questions, use text or audio to drive the Digital Human model to generate replies. Then, publish the real-time video stream to Real-time Audio and Video so that the stream can be played on a client to display the replies of the Digital Human model. |
Feature | Description |
---|---|
Digital Human asset query |
Developers can call server APIs to query the information about available public and customized Digital Human models, such as their image and timbre. |
Video production in asynchronous mode or in real time |
Both the generation of video files and the generation of real-time audio or video streams are supported. |
Different streaming media formats and resolutions |
Supported encapsulation formats are MP4 and WebM (with alpha channels supported). Supported video resolutions are 1080P and 2K. |
Speech synthesis |
Text-to-speech and SSML are supported. |
Large language model |
The large language model can generate replies based on the questions asked. |