Image & Video APIs

Accessible media

Last updated: Jun-30-2025

Digital accessibility ensures that everyone, regardless of ability, can access, understand, and engage with online content. Disabilities can be permanent, like colorblindness, temporary, like light sensitivity from a concussion, or situational, such as trying to watch a video in a noisy café. Prioritizing accessible media isn't just about legal compliance, it's about creating inclusive, user-friendly experiences for all.

Different types of media present unique accessibility challenges and opportunities:

  • Images and graphics require text alternatives for users with visual impairments, color adjustments for users with color vision deficiencies, and consideration of motion sensitivity for animated content. Complex images like charts and infographics may need extended descriptions to convey their full meaning.
  • Videos need captions for users with hearing impairments, audio descriptions for users with visual impairments, and accessible player controls that work with keyboard navigation and screen readers. Motion in videos can also trigger vestibular disorders or seizures in some users.
  • Audio content requires transcripts for users with hearing impairments, volume controls for users with different hearing abilities, and the ability to distinguish foreground speech from background sounds. Live audio needs real-time captioning solutions.
  • Interactive media like product galleries and media players must be operable through keyboard navigation, provide clear focus indicators, and include appropriate ARIA (Accessible Rich Internet Applications) labels for screen readers. All interactive elements need to be accessible regardless of input method.
  • Document and file formats need to be screen reader compatible, have proper heading structures, and include alternative formats when necessary.

Around the world, standards like the Web Content Accessibility Guidelines (WCAG), the European Accessibility Act (EAA), the Americans with Disabilities Act (ADA), and others set expectations for digital accessibility. This guide explores how you can use Cloudinary's tools and best practices to make each of these media types more accessible, empowering your team to build inclusive content from the start.

Media accessibility considerations

When designing media experiences, it's helpful to think about the diverse ways people consume content. The following tables present key accessibility considerations for images, videos, and audio, along with Cloudinary tools and techniques that can help address these needs. Each consideration also includes links to relevant Web Content Accessibility Guidelines (WCAG) for those who want to dive deeper into the technical standards.

Important
This guide is intended to help teams implement accessibility best practices using Cloudinary's capabilities. It is not a certification of compliance with any specific legal standard (such as WCAG, ADA, or EAA). For formal compliance or legal guidance, please consult your legal or accessibility teams.

Perceivability considerations

When creating accessible media, consider how users with different abilities will perceive your content. Visual content needs text alternatives for screen readers, videos require captions and audio descriptions, and color-based information needs additional visual cues to be accessible to users with color vision differences.

Image accessibility considerations

Consideration Cloudinary Image Techniques WCAG Reference
Consider how users with visual impairments will understand your images - they may rely on screen readers that need descriptive text alternatives to convey the same information. 🔧 Managing text alternatives

🔧 AI-based image captioning

🔧 Cloudinary AI Vision
1.1.1 Non-text content

Video and audio accessibility considerations

Consideration Cloudinary Video Techniques WCAG Reference
Think about users who can't hear audio-only content - they'll need text transcripts. For video-only content, consider whether users who can't see the visuals would understand what's happening through text descriptions or audio narration. 🔧 Alternatives for video only content

🔧 Audio and video transcriptions
1.2.1 Audio-only and video-only (prerecorded)
Consider users who can't hear your video content - they'll need captions that show not just dialogue, but also sound effects and other important audio cues that contribute to understanding. 🔧 Video captions 1.2.2 Captions (prerecorded)
Think about users who can't see your video content - they may need audio descriptions that explain what's happening visually, including actions, scene changes, and other important visual information that isn't conveyed through dialogue alone. 🔧 Audio descriptions 1.2.3 Audio description or media alternative (prerecorded)

1.2.5 Audio description (prerecorded)

1.2.7 Extended audio description (prerecorded)
For live content, consider how users who are deaf or hard of hearing will follow along with real-time events - they'll need live captions or transcripts that keep pace with the broadcast. 🔧 Live streaming closed captions 1.2.4 Captions (live)

1.2.9 Audio-only (Live)
Consider users who communicate primarily through sign language - they may prefer sign language interpretation over captions for understanding spoken content. 🔧 Sign language in video overlays 1.2.6 Sign language (prerecorded)
Think about providing comprehensive synchronized text alternatives that give users the same information as your video or audio content, so they can choose the format that works best for them. 🔧 Alternatives for video only content

🔧 Audio and video transcriptions
1.2.8 Media alternative (prerecorded)

Visual and audio clarity considerations

Consideration Cloudinary Image Techniques Cloudinary Video Techniques WCAG Reference
Consider that some users can't distinguish colors - if you're using color to convey important information, think about adding patterns, shapes, or text labels so everyone can understand the message. 🔧 Assist people with color blind conditions 1.4.1 Use of color
Think about users who may be startled or distracted by unexpected audio - if your content plays sound automatically, consider giving users controls to pause, stop, or adjust the volume. 🔧 Adjust audio volume

🔧 Cloudinary Video Player
1.4.2 Audio control
Consider users with visual impairments who may have difficulty reading text with poor contrast - they'll need sufficient color contrast between text and backgrounds to read your content comfortably. 🔧 Customizable caption styling

🔧 Text overlays on images and videos

🔧 Adjust contrast on images and videos

🔧 Replacing colors for light/dark themes
1.4.3 Contrast (minimum)

1.4.6 Contrast (enhanced)
Consider whether users can resize, customize, or access your text content - actual text is generally more flexible and accessible than text embedded in images. 🔧 Customize text overlays in images

🔧 OCR text detection and extraction
1.4.5 Images of text
Think about users who have difficulty separating speech from background noise - they may need clear audio where the main content stands out from any background sounds. 🔧 Mixing audio tracks 1.4.7 Low or no background audio

Operability considerations

Consider how users will interact with and control your media content. Some users rely on keyboards instead of mice, others may be sensitive to motion and flashing content, and many need the ability to pause or adjust audio that plays automatically. Design your media experiences to accommodate different interaction methods and user preferences.

Consideration Cloudinary Image Techniques WCAG Reference
Consider users who may be sensitive to motion or prefer reduced animations - they may want the option to disable or reduce motion effects that aren't essential to understanding your content. 🔧 Convert animations to still images 2.3.3 Animation from interactions

Cloudinary's UI widgets have been built with operability considerations in mind, including keyboard navigation, focus management, and user control features. For detailed information about the accessibility features available in these components, see the Cloudinary Product Gallery widget and Cloudinary Video Player sections.


Image accessibility

Text alternatives are crucial for making visual content accessible to everyone, particularly users with visual impairments, cognitive disabilities, or those using assistive technologies like screen readers. These alternatives provide equivalent information about images, charts, diagrams, and other visual content in a format that can be understood through text-to-speech software, braille displays, or simply read by users who prefer textual descriptions.

A woman in a stunning white mermaid-style wedding gown stands in an ornate, vintage-inspired room with intricate architectural details and decorative elements.


Cloudinary provides tools and approaches for managing text alternatives at scale, including:

  • Centralized metadata management: Store alt text and descriptions with your assets as the single source of truth
  • AI-powered generation: Automatically generate descriptive alt text using machine learning

Managing text alternatives

Cloudinary Assets, Cloudinary's Digital Asset Management (DAM) product, serves as the single source of truth for managing text alternatives across your entire Media Library. Rather than maintaining alt text in multiple locations throughout your application, you can centralize all accessibility metadata within Cloudinary Assets, enabling consistent management, review, and approval workflows.

Using contextual metadata for text alternatives

The simplest approach is to use Cloudinary's built-in contextual metadata field, called alt, to store text alternatives. You can manage this field through the Media Library interface or programmatically via the APIs.

Sample contextual metadata in Asset Management page

Here's an example of setting the alt contextual metadata field for an image during upload programmatically:

Alternatively, you can use any contextual metadata field name to store the text.

Using structured metadata for text alternatives

For better standardization across an organization, you can use structured metadata to create custom fields that support validation, approval workflows, and advanced search capabilities.

You can manage structured metadata fields through the Media Library interface or programmatically via the APIs.

Sample structured metadata in Asset Management page

Here's an example of setting a structured metadata field, with external ID, asset_description, on upload:

Centralized asset management benefits

By storing accessibility metadata directly with your assets in Cloudinary, you gain several key advantages, including:

  • Single source of truth:
    • All text alternatives are stored with the asset, ensuring consistency across all implementations
    • No need to maintain separate databases or files for accessibility content
    • Changes to descriptions automatically propagate to all applications using the asset
  • Searchable and discoverable:
    • Use the Search API method to find assets by their accessibility descriptions
    • Identify assets missing alt text for remediation
    • Analyze description quality across your Media Library
  • Review and approval workflows:
    • Content teams can review and approve accessibility descriptions in the Media Library
    • Implement approval workflows before publishing content
  • Bulk operations:
    • Programmatically update multiple assets simultaneously
    • Import accessibility descriptions from external sources
    • Export descriptions for review by accessibility experts

Integrating with delivery

Once text alternatives are stored as asset metadata, they're automatically available in your delivery implementations:

JavaScript integration:

Product Gallery Widget integration:

This centralized approach ensures that accessibility improvements benefit all your applications simultaneously, while providing the tools needed for professional content management and review processes.

AI-based image captioning

While storing text alternatives as metadata provides centralized management, creating meaningful descriptions for large image libraries can be time-consuming. Using AI-based image captioning, you can programmatically provide captions for images, saving time and resources.

  1. Subscribe to the Cloudinary AI Content Analysis add-on.
  2. Upload an image to your Media Library, invoking AI-based image captioning:

  3. Use the AI-generated caption from the response for the alt text:

    Example code:

A woman in a stunning white mermaid-style wedding gown stands in an ornate, vintage-inspired room with intricate architectural details and decorative elements.


Alternative ways to invoke AI-based image captioning:

  • Define an upload preset in the Cloudinary Console settings, which you can use either programmatically, or in your Media Library, when uploading images.

    Setting AI captioning in an upload preset
  • Use the Cloudinary Image Captioning block in MediaFlows to generate a caption as part of a workflow, for example, to update the alt contextual metadata field on upload.

    Invoking AI captioning using MediaFlows
  • For images already in your Media Library, use the update method of the Admin API, instead of the upload method of the Upload API.

Cloudinary AI Vision

An alternative to AI-based image captioning is to use the Cloudinary AI Vision add-on. This has the benefit of analyzing images that are external to Cloudinary, or stored in Cloudinary product environments that you don't own. You just need a valid URL to the image.

  1. Subscribe to the Cloudinary AI Vision add-on.
  2. Send a request to the Analyze API asking for a brief description of the image.

  3. Use the AI-generated response for the alt text:

    Example code:

A mermaid-style wedding dress with a fitted bodice and dramatic tulle skirt photographed in an elegant room with ornate wall moldings and decorative stone urns.

Related topics


Video and audio accessibility

Audio and video content presents unique accessibility challenges because the information is presented over time and may not be perceivable by all users. People with hearing impairments need text alternatives like captions and transcripts, while those with visual impairments benefit from audio descriptions that explain visual content. Additionally, users may need control over audio levels or the ability to pause content that plays automatically.

Autoplay best practices

The most accessible approach is often to let users control when media plays. Autoplaying content can interfere with screen readers, startle users, cause motion sensitivity issues, and consume bandwidth without consent.

Best practice: Provide clear controls and let users initiate playback. If autoplay is necessary, mute videos by default and include prominent play/pause controls.

This section covers Cloudinary's tools and techniques for making time-based media accessible, including generating transcriptions, adding captions, creating audio descriptions, and providing sign language overlays.

Alternatives for video-only content

Video-only content (videos without audio) can be made accessible in two main ways: by providing a written text description that conveys the visual information, or by adding an audio track that narrates what's happening on screen.

Video with written description

Videos containing no audio aren't accessible to people with visual impairments. You can provide a text alternative in the form of a description presented alongside a video, which a screen reader can read:


Video description:

The video takes place in a picturesque, hilly landscape during dusk, featuring rocky formations and a clear sky transitioning in colors from blue to orange hues. Initially, a person is seen standing next to a parked SUV. As the video progresses, another individual joins, and they are seen together near the vehicle. The couple, equipped with backpacks, appears to be enjoying the serene environment. Midway through, they share a hug near the SUV, emphasizing a moment of closeness in the tranquil setting. Towards the end of the video, the couple is observed walking away from the SUV, exploring the scenic, rugged terrain around them.


Video with audio description

As an alternative to providing a video with a written description, you can provide the description as an audio track, using the audio layer parameter (l_audio):

Use the button below to toggle the audio description on and off. Notice how the transformation URL changes when you toggle the audio description:

🎧 Audio Description Toggle Demo

Audio Description: Off - Click the button below to toggle the audio description track.

Current transformation URL:
https://res.cloudinary.com/demo/video/upload/docs/grocery-store-no-audio.mp4

Audio and video transcriptions

For people with hearing impairments, you can generate transcripts of audio and video files that contain speech using Cloudinary's transcription functionality.

For audio-only files, you can present the text alongside the audio. For example, for this podcast, you can generate the transcript and display the wording below the file:

Transcript:

Tonight on Quest, nanotechnology. It's a term that's become synonymous with California's high-tech future. But what are these mysterious nanomaterials and machines? And why are they so special? Come along as we take an incredible journey into the land of the unimaginably small.


Tip
You can also display transcriptions alongside videos, or you can use the transcript as video captions.

Types of transcription services

Cloudinary offers multiple transcription options to suit different needs and workflows:

Service Key Features Formats Learn More
Cloudinary Video Transcription (Built-in service) • Automatic language detection and transcription
• Supports translation to multiple languages
• Integrates seamlessly with the Cloudinary Video Player
• Includes confidence scores for quality assessment
.transcript 🔧 Learn more
Google AI Video Transcription add-on • Leverages Google's Speech-to-Text API
• Supports over 125 languages and variants
• Provides speaker diarization (identifying different speakers)
.transcript
VTT
SRT
🔧 Learn more
Microsoft Azure Video Indexer add-on • Advanced video analysis including speech transcription
• Supports multiple languages with auto-detection
• Provides additional insights like emotions and topics
• Generates comprehensive metadata about video content
.transcript
VTT
SRT
🔧 Learn more

Transcript formats and usage

Different services generate transcripts in different formats, each suited for specific use cases:

Format Description Best For Example
Cloudinary .transcript JSON format with word-level timing and confidence scores. Each excerpt includes full transcript text, confidence value, and individual word breakdowns with precise start/end times. Advanced Video Player features like word highlighting, paced subtitles, and confidence-based filtering Example file
Google .transcript JSON format with word-level timing and confidence scores. Similar structure to Cloudinary but generated via Google's Speech-to-Text API with varying excerpt lengths. Google-specific integrations and applications requiring Google's speech recognition accuracy Example file
Azure .transcript JSON format with confidence scores and start/end times. Provides transcript excerpts with timing but different structure than Cloudinary/Google formats. Microsoft Azure integrations and applications requiring Azure's video analysis capabilities Example file
VTT files (WebVTT) Industry standard web-based caption format with timing cues. Supported by most modern video players and browsers. Web video players, HTML5 video elements, and broad compatibility needs Example file
SRT files (SubRip) Simple text format with numbered sequences and timing codes. Widely supported across video editing software and players. Video editing workflows, legacy player support, and simple subtitle implementations Example file

Generating transcripts using the Cloudinary Video Transcription

You can generate transcripts using several methods:

During upload:

Tip
See also Requesting translation for international accessibility.

For existing videos using the explicit method:

From the Video Player Studio:

Navigate to the Video Player Studio, add your video's public ID, and select the Transcript Editor to generate and edit transcripts directly in the interface.

The transcript editor provides a user-friendly interface for:

  • Generating transcripts for existing videos
  • Editing transcript content to ensure accuracy
  • Adjusting individual word timings
  • Adding or removing transcript lines
  • Reviewing confidence scores for quality assessment

Transcript editor interface

Related topics

Video captions

For prerecorded audio content in synchronized media you can supply captions. Using the Cloudinary Video Player you can display your generated transcriptions in sync with the video.


Having generated your transcription, to add it as captions to your video, set the textTracks parameter at the source level.

If you use the Cloudinary video transcription service to generate your transcription, as in this example, you don't need to specify the transcript file in the Video Player configuration - it's added automatically.

Here's the configuration for the video above:

Adding VTT and SRT files as captions

To set a VTT or SRT file as the captions (if you use a different transcription service), specify the file in the url field of the textTracks captions object.

VTT:

SRT:

Related topics

Audio descriptions

You can use the Cloudinary Video Player to display audio descriptions as captions on a video, which a screen reader can read. Additionally, you can set up alternative audio tracks that can provide audio descriptions of a video instead of any audio that's built into the video.

Audio descriptions as captions

When a video has no audio, or no dialogue, it can be helpful to provide a description of the video. You can present this alongside the video, or as captions in the video. Both can be read by screen readers.

In this video, the description appears as captions. Notice the audio descriptions menu at the bottom right of the player, when you play the video:

Audio descriptions menu

To set audio descriptions as captions, which can be read by screen readers, use the descriptions kind of text track:

Audio descriptions as alternative audio tracks

A different way of providing a description of a video is to provide an alternative audio track. In this video the description is available as an audio track, which you can select from the audio selection menu at the bottom right of the player when you play the video:

Audio descriptions menu


To define additional audio tracks, use the audio layer transformation (l_audio) with the alternate flag (fl_alternate). This functionality is supported only for videos using adaptive bitrate streaming with automatic streaming profile selection (sp_auto).

Here's the transformation URL that's supplied to the Video Player:

Tip
For progressive videos, consider the solution shown in Video with audio description.

Related topics

Live streaming closed captions

Add closed captions to your live streams using your preferred streaming software. Cloudinary's live streaming capabilities include support for captions embedded in the H.264 video stream using the CEA-608 standard.

  1. To begin streaming, navigate to the Live Streams page in the Console and create a new live stream by providing a name. Once created, you'll receive the necessary streaming details:

    • Input: RTMP URL and Stream Key – Use these credentials to configure your streaming software and initiate the live stream.
    • Output: HLS URL – This is your stream's output URL, which can be used with your video player. The Cloudinary Video Player natively supports live streams for seamless playback.

    Live streaming UI in the Cloudinary Console

    Live streaming in the Cloudinary Console
  2. Add closed captions to the stream using your streaming software.

Live streaming software showing live captions being added

Live streaming software showing live captions being added

Related topics

Sign language in video overlays

To help deaf people to understand the dialogue in a video, you could apply a sign-language overlay.

  1. Upload the video(s) of the sign language and the main video to your product environment.
  2. Use a video overlay (l_video in URLs) with placement (e.g. g_south_east) and start and end offsets (e.g so_2.0 and eo_4.0).
  3. Optionally speed up or slow down the overlay to fit with the dialogue (e.g. e_accelerate:100).

In this example, there are two overlays applied, one between 2 and 4 seconds, and the other at 6 seconds (this one is present until the end of the video so no end offset is required):


Other options to try:

  • Consider making the background of the signer transparent to see more of the main video (e.g. co_rgb:aca79d,e_make_transparent:0 as a transformation in the video layer). This works best if the background is a different color to anything else in the video.
  • Fade the overlay videos in and out for a smoother effect (e.g. e_fade:500/e_fade:-500).

Related topics


Visual and audio clarity

Making content distinguishable means ensuring that users can perceive and differentiate important information regardless of their visual or auditory abilities. This includes providing sufficient color contrast, not relying solely on color to convey information, controlling audio levels, and allowing customization of visual and audio elements.

Users with color blindness, low vision, hearing impairments, or various sensitivities need content that can be perceived clearly in different ways. This section covers Cloudinary's tools for creating high-contrast visuals, assisting color blind users, managing audio levels, customizing text presentation, and adapting content for different viewing modes and environments.

Assist people with color blind conditions

People with color blindness may have difficulty distinguishing between certain colors, particularly red and green. Cloudinary provides tools to help make your media more accessible by using both color and pattern to convey information.

Original Original
X-ray mode X-Ray Mode Striped Overlays Striped Overlays

Simulate color blind conditions

You can experience how your images look to people with different color blind conditions. Apply the e_simulate_colorblind effect with parameters like deuteranopia, protanopia, tritanopia, or cone_monochromacy to preview your content (see all the options).

Color palette with different simulated colorblind conditions

Analyze color accessibility

For a more objective approach to assessing the accessibility of your images, use Accessibility analysis (currently available to paid accounts only).

  1. Upload your images with the accessibility_analysis parameter set to true:

  2. See the accessibility results in the response:

For more information see Accessibility analysis, and for an example of using the results, watch this video tutorial.

Apply stripes

Consider a chart that uses red and green colors to convey information. For someone with red-green color blindness, this information would be inaccessible.

Pie chart Original image Pie chart as seen by someone with deuteranopia Simulated deuteranopia


By adding patterns or symbols alongside colors, you can ensure the information is conveyed regardless of color perception.

Pie chart e_assist_colorblind:20 Pie chart as seen by someone with deuteranopia e_assist_colorblind:20
/e_simulate_colorblind


To add the stripes, apply the assist_colorblind effect with a stripe strength from 1 to 100, e.g. e_assist_colorblind:20:

Apply color shifts

For an image where the problematic colors aren't isolated, it can be even harder to distinguish the content of the image.

Flower and grasshopper Original image Flower and grasshopper as seen by someone with deuteranopia Simulated deuteranopia


By shifting the colors, you can ensure the image is clear regardless of color perception.

Flower and grasshopper e_assist_colorblind:xray Flower and grasshopper as seen by someone with deuteranopia Simulated deuteranopia
after using e_assist_colorblind:xray


To shift the colors, apply the e_assist_colorblind effect with the xray effect, e.g. e_assist_colorblind:xray:

Obviously, you wouldn't want everyone to experience your images with the assist colorblind effects applied, but you could consider implementing a toggle that adds these effects to your images on demand.

Interactive color blind accessibility demo

Use the controls below to test different color blind assistance techniques and simulate various color blind conditions. This helps you understand which techniques work best for different types of color vision deficiency.

Demo image showing color blind accessibility techniques

Current Transformation URL:

https://res.cloudinary.com/demo/image/upload/w_400/bo_1px_solid_black/docs/piechart.png

Tips for Testing:

  • Pie Chart: Notice how stripes help distinguish sections that may look similar with color blindness
  • Red Flower: X-ray mode shifts colors to make the flower more visible against the green background
  • Compare: Try different combinations to see which techniques work best for each condition

Adjust audio volume

For people with hearing impairments or those in different listening environments, providing volume control options ensures your audio and video content is accessible. The WCAG guidelines specify that if audio plays automatically for more than 3 seconds, users must have a mechanism to pause, stop, or control the volume independently.

With Cloudinary, you can implement this mechanism both programmatically and using the Cloudinary Video Player.

Programmatic volume adjustment

Programmatically adjust the volume directly in your media transformations using the volume effect (e_volume). This allows you give control to your users via external controls (as shown in the demo).

For example, to reduce the volume to 50% (e_volume:50):


To increase the volume by 150% (e_volume:150):


You can also mute audio completely by setting the volume to mute:

Note
You can also adjust volume programmatically using the HTMLMediaElement volume property:

Demo: External volume controls using transformations

For users with restricted movement or motor disabilities, you can create larger, more accessible volume controls outside the video player. These external controls use Cloudinary's volume transformations to deliver videos at different volume levels, making them easier to interact with than the built-in player controls. You can see the delivery URL change when you choose a different volume.

Volume: Normal (100%)
Current transformation URL:
https://res.cloudinary.com/demo/video/upload/docs/grocery-store.mp4

Video Player volume controls

The Cloudinary Video Player provides built-in volume controls that users can adjust according to their needs. The player includes both a volume button and a volume slider for precise control.

You can customize the volume controls and set default volume levels in your JavaScript:

Customizable caption styling

Captions and subtitles must meet specific contrast requirements to be accessible to people with visual impairments. The WCAG guidelines specify minimum contrast ratios between text and background colors to ensure readability.

Understanding contrast ratios

A contrast ratio measures the difference in brightness between text and its background, expressed as a ratio like 4.5:1 or 7:1. The higher the number, the more contrast there is.

WCAG Requirements:

  • Level AA (minimum): 4.5:1 contrast ratio for normal text
  • Level AAA (enhanced): 7:1 contrast ratio for normal text
  • Large text (18pt+ or 14pt+ bold): Lower ratios of 3:1 (AA) or 4.5:1 (AAA)

How to measure contrast ratios

You can measure contrast ratios using online tools such as WebAIM Contrast Checker.

How it works:

  1. Pick your colors: Select the text color and background color
  2. Get the ratio: The tool calculates the mathematical contrast ratio
  3. Check compliance: See if it meets WCAG AA (4.5:1) or AAA (7:1) standards

Example measurements:

  • Black text (#000000) on white background (#FFFFFF) = 21:1 (excellent)
  • White text (#FFFFFF) on blue background (#0066CC) = 5.74:1 (passes AA)
  • Light gray text (#CCCCCC) on white background (#FFFFFF) = 1.61:1 (fails - too low)

Implementing accessible caption styling

The Cloudinary Video Player allows you to customize caption appearance to meet contrast requirements. The recommended approach is to use the built-in theme options which provide predefined backgrounds and styling.

The built-in themes are described in this table:

Theme Description Best for
default None High contrast videos only
videojs-default High contrast theme with a dark background and white text General accessibility
yellow-outlined Yellow text with a dark outline for visibility Videos with varied backgrounds
player-colors Uses the video player's custom color scheme for the text and background Brand consistency + accessibility
3d Text with a 3D shadow effect Stylistic preference

The example at the top of this section uses the videojs-default theme. Note that you can also override elements of the theme, for example, by setting the font size. Here's the Video Player configuration:


To set custom colors for the font and background you can use the player-colors theme. This theme uses the colors that you configure when customizing your Video Player.

Related topics

Text overlays on images and videos

Before creating text overlays embedded in images or videos, consider whether the text could instead be placed in your HTML and visually positioned over the media using CSS. HTML text is inherently more accessible because it can be announced by screen readers, restyled by users, translated automatically, and scales with user preferences—all without requiring additional accessibility techniques.

When you do need embedded text overlays in images and videos, it's crucial to ensure sufficient contrast between the text and background for readability. People with visual impairments or those viewing content in bright environments need clear, high-contrast text. Adding background colors or effects to text overlays helps meet WCAG contrast requirements and improves accessibility for everyone.

Text overlays on images with background

Without proper contrast, text overlays can be difficult or impossible to read. Here's how to add accessible text overlays with background colors:

Image with white text overlay that's difficult to read Poor contrast - hard to read Image with white text on black background that's easy to read High contrast - accessible


The accessible version uses a semi transparent black background (b_rgb:00000080) behind white text (co_white) to achieve maximum contrast:

Text overlays on videos with background

Video text overlays face additional challenges as backgrounds change throughout the video. Consistent background colors ensure text remains readable regardless of the video content. This video uses white text (co_white) on a semi-transparent blue background (b_rgb:0000cc90) to create an overlay that remains visible throughout the video.

Adjust contrast on images and videos

Proper contrast, brightness, and saturation adjustments are essential for making images and videos accessible to people with visual impairments, low vision, or those viewing content in challenging lighting conditions. These adjustments can help ensure content remains visible and legible across different viewing environments and for users with varying visual needs.

Contrast adjustments for images

Contrast adjustments can dramatically improve the readability and accessibility of images. Here are examples showing how different contrast levels affect image visibility:

Image with reduced contrast Low contrast
(-80)
Original image Original
(0)
Image with enhanced contrast High contrast
(+80)


Use the contrast effect with a value between -100 and 100:

Interactive contrast, brightness, and saturation demo

In addition to contrast, you can also alter brightness and saturation to help improve image visibility.

Use the controls below to see how contrast, brightness, and saturation adjustments affect image accessibility in real-time. Notice how the transformation URL changes as you adjust the settings:

Demo image for contrast adjustments
Current transformation URL:
https://res.cloudinary.com/demo/image/upload/c_scale,w_500/f_auto/q_auto/docs/groceryshop.jpg

Video visual adjustments

Video content can also benefit from contrast, brightness, and saturation adjustments. These are especially important for users with visual impairments who may struggle with low-contrast video content.

This video uses enhanced contrast (e_contrast:50), increased brightness (e_brightness:10) and saturation (e_saturation:20) to improve visibility and accessibility.

Replacing colors for light/dark themes

For users who navigate websites with light and dark themes, consistency in visual presentation is crucial for both usability and accessibility. Light and dark themes can significantly impact users with visual sensitivities, light sensitivity conditions, or those who simply prefer one theme over another for better readability. Cloudinary provides powerful tools to automatically adapt image colors to match your application's theme, ensuring a cohesive visual experience.

Understanding the accessibility need

Different users have varying preferences and needs when it comes to visual themes:

  • Light sensitivity: Users with photophobia, migraines, or certain medical conditions may find dark themes more comfortable
  • Visual impairments: Some users with low vision find better contrast in a specific theme
  • Environmental factors: Dark themes can be easier on the eyes in low-light environments
  • Battery conservation: On OLED displays, dark themes can help conserve battery life
  • Personal preference: Users may simply prefer one theme for better readability

Dynamic color replacement with replace_color

The replace_color effect allows you to dynamically swap colors in images based on the user's theme preference. This is particularly useful for logos, icons, and graphics that need to maintain brand consistency while adapting to different backgrounds. Try changing the theme at the top right of this page, and you'll see how the different icons look with light and dark themes.

Original logo Light theme (original) Logo adapted for dark theme Dark theme adapted


This example replaces the predominant color with light gray (e_replace_color:e6e6e6:50) with a tolerance of 50 to ensure similar shades are also replaced:

Using the theme effect for comprehensive adaptation

For more sophisticated theme adaptation, use the theme effect which applies comprehensive color adjustments to the image based on a specific background color.

For example, change the screen capture to a dark theme with increased sensitivity to photographic elements (e_theme:color_black:photosensitivity_110):

Original Cloudinary website screenshot Original (Light Theme) Dark-themed Cloudinary website screenshot Dark Theme Adaptation
e_theme:color_black:photosensitivity_110


The effect applies an algorithm that intelligently adjusts the color of illustrations, such as backgrounds, designs, texts, and logos, while keeping photographic elements in their original colors.

Interactive theme adaptation demo

Experience how Cloudinary can automatically adapt images for different themes. This demo shows how the same image can be dynamically modified to suit both light and dark themes using the replace_color transformation, in addition to smart color replacement using the theme effect:

Demo image for theme adaptation
Current transformation URL:
https://res.cloudinary.com/demo/image/upload/c_scale,w_400/f_auto/q_auto/cloudinary_icon.png

Related topics

Customize text overlays in images

Customizable text overlays are essential for accessibility because they allow you to adapt text presentation to meet diverse user needs. Users with visual impairments, dyslexia, or reading difficulties often benefit from specific font styles, sizes, and spacing adjustments. By providing flexibility in text overlay styling, you ensure your content remains accessible across different abilities and preferences.

The WCAG guidelines emphasize that text should be customizable to support users who need larger fonts, different font families, or modified spacing for better readability. Cloudinary's text overlay system provides extensive customization options that help you meet these accessibility requirements while maintaining visual appeal.

Standard text

Large bold (low vision)

Letter spacing (dyslexia)


Understanding text overlay parameters

Cloudinary's text overlay transformation (l_text) supports numerous styling parameters that can be combined to create accessible and visually appealing text:

Core Parameters (Required):

  • Font: Any universally available font or custom font (e.g., Arial, Helvetica, Times)
  • Size: Text size in pixels (e.g., 50, 100)

Styling Parameters (Optional):

  • Weight: Font thickness (normal, bold, thin, light)
  • Style: Font appearance (normal, italic)
  • Decoration: Text decoration (normal, underline, strikethrough)
  • Alignment: Text positioning (left, center, right, justify)
  • Stroke: Text outline (none, stroke)
  • Letter spacing: Space between characters (letter_spacing_<value>)
  • Line spacing: Space between lines (line_spacing_<value>)

Visual Enhancement Parameters:

  • Color: Text color (co_<color>)
  • Background: Background color (b_<color>)
  • Border: Outline styling (bo_<border>)

Interactive text overlay customization demo

Use the controls below to experiment with different text styling parameters and see how they affect accessibility and readability. Notice how the transformation URL updates as you adjust the settings:

Demo image with customizable text overlay
Current transformation URL:
https://res.cloudinary.com/demo/image/upload/c_fit,l_text:Arial_50:Sample%20Text,co_black,w_1800/fl_layer_apply,g_center/c_scale,w_600/f_auto/q_auto/docs/white-texture.jpg

Accessibility considerations for text overlays

When creating text overlays, consider these accessibility best practices:

  1. Font Size: Use sizes of at least 16px for body text, larger for headers. Users with low vision may need even larger text.

  2. Font Choice: Sans-serif fonts like Arial and Helvetica are often easier to read, especially for users with dyslexia.

  3. Letter Spacing: Additional spacing between letters can improve readability for users with dyslexia or visual processing difficulties.

  4. Color Contrast: Ensure sufficient contrast between text and background colors (minimum 4.5:1 ratio for normal text).

  5. Background: Use solid background colors behind text when overlaying on complex images to ensure readability.

  6. Font Weight: Bold text can improve readability, but avoid fonts that are too thin (like light or thin weights) for important content.

Video text overlays

The same customization principles apply to video text overlays. Here's an example of accessible text styling on video content:


This example uses large, bold white text (Arial_60_bold) with a semi-transparent black background (b_rgb:000000cc) to ensure high contrast and readability across the entire video.

OCR text detection and extraction

For images containing text content, Optical Character Recognition (OCR) technology can extract that text and make it accessible to screen readers and other assistive technologies. This is particularly important for images of documents, signs, menus, handwritten notes, or any visual content where text is embedded within the image rather than provided as separate HTML text.

Cloudinary's OCR Text Detection and Extraction add-on can automatically extract text from images during upload, making the content available for accessibility purposes.

Here's an example showing an Italian restaurant menu and the text that Cloudinary's OCR add-on automatically extracted from it:

Italian restaurant menu showing three menu options with prices.

Extracted Text Content (Available to Screen Readers):

MENU 1
INSALATA VERDE
PIZZA CAPRESE
18.50

MENU 2
BRUSCHETTA DELLA CASA
INSALATA DI POLLO
19.50

MENU 3
BRUSCHETTA DELLA CASA
CANNELLONI DI CARNE
AL FORNO 21.50

This text content was automatically extracted using OCR and can be read by screen readers, making the Italian menu accessible to users with visual impairments. Note that the OCR detected the language as Italian (locale: "it") and extracted all menu items with their prices.

To extract text from an image:

  1. Subscribe to the OCR add-on: Enable the OCR Text Detection and Extraction add-on in your Cloudinary account.

  2. Extract text during upload: When uploading images that contain text, use the ocr parameter to extract the text content:

  3. Use extracted text for accessibility: The OCR results are returned in the upload response and can be used to provide accessible alternatives:

    Here's an example in React using the Italian restaurant menu response:

Notes
  • You can invoke the OCR Text Detection and Extraction add-on for images already in your product environment using the Admin API update method.
  • You can retrieve the response at a later date using the Admin API resource method.
  • Consider using contextual or structured metadata to store the text.

Mixing audio tracks

For users with hearing difficulties or auditory processing disorders, the ability to control the balance between foreground speech and background audio is crucial for accessibility. The WCAG guidelines specify that background sounds should be at least 20 decibels lower than foreground speech content, or users should have the ability to turn off background sounds entirely.

Cloudinary's audio mixing capabilities allow you to layer multiple audio tracks and control their relative volumes, ensuring your content meets accessibility requirements while maintaining audio richness.

To control the volume of different audio tracks, use the volume effect in each of the audio layers. In this example, the narration is set to a volume of 3dB higher than the original asset (e_volume:3dB), and the background wind noise is set to a volume of 18dB lower than the original asset (e_volume:-18dB):

Audio normalization for consistent levels

Before mixing audio tracks, it helps to normalize them to consistent baseline levels. Different audio recordings often have varying baseline volumes, which can make it difficult to achieve predictable dB differences for accessibility compliance.

To normalize your audio files before uploading them to Cloudinary, you can use audio processing tools, such as FFmpeg.

For example, normalize the audio file nantech.mp3 to -16 LUFS:

This ensures that when you apply -20dB or -25dB adjustments in Cloudinary, you get the exact dB separation needed for WCAG compliance.

Interactive audio mixing demo

This demo shows how Cloudinary can mix a primary audio track (nanotechnology narration) with a background audio layer (wind sounds). Use the controls to adjust the volume levels and observe how the dB difference affects accessibility:

🎙️ Narration (Foreground)

Range: -20 dB to +20 dB

🌬️ Wind (Background)

Range: -50 dB to 0 dB
dB Difference: 21 dB
WCAG Compliant: Background is 21 dB lower than foreground (exceeds 20 dB requirement)
Current transformation URL:
https://res.cloudinary.com/demo/video/upload/e_volume:3dB/l_audio:docs:wind_norm/e_volume:-18dB/fl_layer_apply/docs/nanotech_norm.mp3

User-controlled audio track levels

Similar to the above demo, you could provide controls in your application to let the user decide on the levels of each track to meet their needs. Here's some example React code that you could use:

Best practices for accessible audio mixing
  • Always provide a no-background option: Some users need complete silence behind speech
  • Maintain 20+ dB separation: When background audio is present, ensure it's at least 20 dB lower
  • Test with real users: Audio perception varies greatly between individuals
  • Consider frequency content: Low-frequency background sounds are less distracting than mid-range frequencies
  • Provide visual indicators: Show users the current dB levels and compliance status
  • Use consistent levels: Maintain the same audio balance throughout your content

Related topics


Interactive content and controls

User interface components and navigation must be operable by all users, regardless of their physical abilities or the input methods they use. This means ensuring that users can interact with and navigate your content using various methods including keyboards, screen readers, voice commands, or other assistive technologies.

For media content, operability encompasses several key areas: providing alternatives to motion-based content for users with vestibular disorders, ensuring all interactive elements are keyboard accessible, and designing interfaces that work seamlessly with assistive technologies. Users with motor impairments, visual disabilities, or other conditions need content that responds predictably to their preferred interaction methods.

This section covers Cloudinary's tools and widgets that support operable interfaces, including techniques for managing animated content, implementing keyboard-accessible galleries, and creating video players that work with assistive technologies.

Convert animations to still images

Many users have vestibular disorders, seizure conditions, or other sensitivities that make animated content problematic or even dangerous. WCAG Success Criterion 2.3.3 requires that motion animation triggered by interaction can be disabled unless the animation is essential to the functionality. Additionally, some users simply prefer reduced motion for better focus and less distraction.

Cloudinary provides several approaches to make animated content accessible by converting animations to still images or providing user control over motion.

Understanding motion sensitivity

Users may need reduced motion for various reasons:

  • Vestibular disorders: Inner ear conditions that cause dizziness and nausea from motion
  • Seizure disorders: Flashing or rapid motion can trigger seizures
  • Attention disorders: Animation can be distracting and make it difficult to focus
  • Migraine triggers: Motion can trigger or worsen migraines
  • Battery conservation: Reducing animation saves device battery life
  • Bandwidth limitations: Still images use less data than animated content

Extracting still frames from animations

You can extract a single frame from an animated GIF or video to create a still image alternative. This is useful for providing a static version of animated content.

Animated GIF of kittens playing Original animation Still frame from kitten animation Still frame (page 3)


Use the page parameter (pg_) to extract a specific frame from an animated GIF:

You can also extract the first frame by converting the format to a static image format like JPG or PNG:

Implementing user-controlled motion preferences

The most accessible approach is to respect user preferences for reduced motion. Modern browsers support the prefers-reduced-motion CSS media query, which you can combine with Cloudinary transformations to serve appropriate content.

Here's an interactive demo showing how to implement motion preferences:

Kittens playing - animation respects motion preferences
Current setting: Motion enabled
Current image URL:
https://res.cloudinary.com/demo/image/upload/c_scale,w_400/kitten_fighting.gif

Implementation examples

Here are some examples of respecting the prefers-reduced-motion setting:

React:

HTML:

CSS:

Video posters

For video content, you can extract a poster frame to show when motion is reduced:


The poster frame uses the start offset (so_) parameter to extract a frame from 10.2 seconds into the video.

You can also use so_auto to let Cloudinary automatically choose the best frame to use as the poster.

Best practices for motion accessibility
  • Respect system preferences: Always check for prefers-reduced-motion: reduce
  • Provide user controls: Allow users to toggle motion on/off regardless of system settings
  • Choose meaningful still frames: Select frames that best represent the animated content
  • Maintain functionality: Ensure that stopping animation doesn't break essential features
  • Test with users: Verify that reduced motion versions are still informative and useful
  • Consider alternatives: Sometimes a different approach (like a slideshow) works better than a single still frame

Cloudinary Product Gallery widget

The Cloudinary Product Gallery widget provides comprehensive accessibility features that ensure users with disabilities can effectively navigate and interact with product galleries. The widget includes keyboard navigation, screen reader support, and customizable display options that meet WCAG operability requirements.

Keyboard accessibility

The Product Gallery enables full keyboard accessibility for users who cannot use a mouse or rely on assistive technologies. All interactive elements are accessible using standard keyboard navigation:

Keyboard Navigation Controls

Key
Action
Tab
Navigate forward through interactive elements
Shift + Tab
Navigate backward through interactive elements
Enter
View an asset or activate zoom
Escape
Exit zoom mode or close overlays
Spacebar
Play/pause videos
Arrow Keys
Navigate between gallery items

Accessible configuration options

For the most accessible experience, Cloudinary recommends these configuration settings:

Screen reader support

The Product Gallery provides semantic markup for screen readers and uses alt text from your configured metadata sources. You can specify where the gallery should look for alt text using the accessibilityProps parameter:

Using structured metadata for alt text:

Using contextual metadata for alt text:

If no alt text source is configured or the specified metadata field is empty, the gallery defaults to descriptive text in the format "Gallery asset n of m".

Focus management and visual indicators

The Product Gallery provides clear visual focus indicators that help users understand their current position within the gallery:

  • High contrast focus rings: Clearly visible borders around focused elements
  • Logical tab order: Sequential navigation through thumbnails, main viewer, and controls
  • Focus trapping: When zoom is activated, focus remains within the zoom interface
  • Expanded mode benefits: In expanded mode, focus areas are more visually prominent

Video accessibility features

When displaying videos in the Product Gallery, accessibility features include:

  • Keyboard controls: Spacebar to play/pause, Enter to activate full controls
  • Screen reader announcements: Video state changes are announced to assistive technology
  • Simplified controls: The controls: "play" option reduces interface complexity
  • Caption support: When using the Cloudinary Video Player, captions and subtitles are fully supported

Responsive accessibility

The Product Gallery maintains accessibility across different viewport sizes:

  • Mobile optimization: Touch-friendly controls and appropriate sizing
  • Viewport breakpoints: Maintains usability as layout adapts to screen size
  • Consistent navigation: Keyboard accessibility preserved across all breakpoints

Best practices for accessible Product Galleries
  • Provide meaningful alt text: Use structured or contextual metadata to supply descriptive alt text for all images
  • Use expanded mode for better focus visibility: The expanded display mode provides more prominent focus indicators
  • Simplify video controls: Use controls: "play" to reduce cognitive load
  • Test with keyboard only: Ensure all functionality is accessible without a mouse
  • Provide captions for videos: When using videos, include captions or transcripts
  • Consider loading states: Ensure loading indicators are announced to screen readers
  • Test with screen readers: Verify that the gallery provides a logical and informative experience for screen reader users

Cloudinary Video Player

The Cloudinary Video Player is designed to provide an inclusive video experience that meets WCAG 2.1 AA compliance standards. The player includes comprehensive accessibility features that ensure users with visual, auditory, motor, and cognitive impairments can fully engage with video content through assistive technologies, keyboard navigation, and other accessibility-friendly enhancements.

Accessibility features overview

The Cloudinary Video Player provides extensive accessibility support including:

  • Full keyboard navigation: All controls accessible via Tab key with clear focus indicators
  • Screen reader compatibility: ARIA attributes and semantic markup for assistive technologies
  • Closed captions and subtitles: Multi-language support with customizable styling (refer to Video captions)
  • Audio descriptions: Support for descriptive audio tracks and caption-based descriptions (refer to Audio descriptions)
  • Video chapters: Easy navigation to key sections for improved usability
  • Adjustable playback: Variable speed controls for better comprehension
  • High-contrast UI: Customizable themes for improved visibility (refer to Customizable caption styling)

Tip
For full details, see the Video Player accessibility guide.

Live accessibility demo

Here's a working Cloudinary Video Player demonstrating accessibility features. Try navigating the controls using only your keyboard (Tab, Space, Arrow keys) and notice the clear focus indicators:

🎬 Accessible Video Player Demo

Keyboard Controls: Tab to navigate controls, Space to play/pause, Arrow keys to seek. Use Tab to reach mute, fullscreen, and caption buttons.

Accessibility Features Demonstrated:
  • Keyboard navigation with visible focus indicators
  • Closed captions with high-contrast styling
  • Screen reader compatible controls
  • Customizable playback speed
  • ARIA labels and semantic markup

Keyboard navigation controls

Video Player Keyboard Controls

Key
Action
Tab
Navigate forward through video player controls
Shift + Tab
Navigate backward through video player controls
Spacebar
Play/pause video
Enter
Activate focused button (play, fullscreen, etc.)
Left Arrow
Seek backward 10 seconds
Right Arrow
Seek forward 10 seconds
Up Arrow
Increase volume
Down Arrow
Decrease volume
Escape
Exit fullscreen mode

Implementation example

Here's how to configure the Cloudinary Video Player with optimal accessibility settings:

✔️ Feedback sent!

Rate this page: