Programmable Media

Video transcription

Last updated: Dec-11-2024

Video transcription enables you to automatically generate an audio transcript from a video file. You can use the resulting file can to display a full video transcript alongside your video, added as a text track for standard subtitles, or used for paced subtitles with the Cloudinary Video Player. Transcript generation identifies the language used in the audio and generates the transcript in the correct language.

Use the Cloudinary Video Transcription service to generate your transcripts during upload or trigger generation from the Video Player Studio, and to edit your transcripts using the transcript editor.

Alternatively, you can use an add-on, either Google AI Video Transcription or Microsoft Azure Video Indexer.

Requesting transcription

To request transcription, set the auto_transcription boolean parameter to true as part of your upload request:

Auto transcription happens asynchronously after your original method call completes. Thus your original method call response displays a pending status:

When the request is complete (may take several seconds or minutes depending on the length of the video), a new raw file gets created in your product environment with the same public ID as your video or audio file and with the .transcript file extension.

If you also provided a notification_url in your method call, the specified URL then receives a notification when the process completes:

Requesting translation

As well as generating a transcript in the native language of the audio, you can also request to generate translated transcriptions. Each translated transcript gets generated alongside the main transcript file with the country and language code appended.

Transcript translation uses the Google Translation add-on and therefore you must enable this for your account.

To trigger translation, set the auto_transcription parameter to an object containing a translate parameter with an array of country and language codes to translate to, for example to generate transcript translations into French, Spanish and German:

Use your translated transcripts with the Cloudinary Video Player to provide subtitles in multiple languages for your videos.

Cloudinary transcript files

The created .transcript file includes details of the audio transcription, for example:

Each excerpt of text has a confidence value, followed by a breakdown of individual words and their specific start and end times.

Displaying transcripts with the Cloudinary Video Player

You can display your generated transcripts as a text track for subtitles or captions using the Cloudinary Video Player. You can also make use of the advanced information generated to add paced subtitles or word highlighting. To add your transcript, set the textTracks parameter with the relevant configuration.

For transcripts, you don't need to provide a URL as the player assumes the transcript exists with the same public id as the video. If you set the language, the player looks for the corresponding file with language code appended to the public ID, otherwise it falls back to the original. To control the number of words shown for each line of the transcript, use the maxWords parameter, as shown below.

Here's an example:


And here is an example using translated transcripts:

Transcript editor

The transcript editor enables you to trigger generation of transcripts using the transcription service for videos in your Media Library. From here, you can edit the generated transcript to ensure the audio matches exactly with the transcript.

The editor supports adding and editing lines, as well as the individual words within each line.

To open the editor, navigate to the Video Player Studio. Ensure you add your public ID in the Video Details section before selecting the Transcript Editor.

Transcript editor

✔️ Feedback sent!

Rate this page: