Last updated: Jan-17-2025
Cloudinary is a cloud-based service that provides solutions for image and video management. These include server or client-side upload, on-the-fly image and video transformations, fast CDN delivery, and a variety of asset management options.
The Cloudinary AI Content Analysis add-on uses AI-based object detection and content-aware algorithms to provide the following functionality:
-
Object-aware cropping: Ensures that your image crops keep the specific objects that matter to you, even when you significantly modify the aspect ratio.
ImportantBy default, delivery URLs that use this add-on either need to be signed or eagerly generated. You can optionally remove this requirement by selecting this add-on in the Allow unsigned add-on transformations section of the Security page in the Console Settings.(For simplicity, most of the examples on this page show eagerly generated URLs without signatures.)
-
Automatic image tagging: Adds tags to your images based on objects or abstract concepts detected by the content-aware detection models specified on upload, or when invoked on images already stored in your product environment.
-
AI-based image captioning: Analyzes an image and suggests a caption to use appropriate to the image's contents.
Getting started
Before you can use the Cloudinary AI Content Analysis add-on:
You must have a Cloudinary account. If you don't already have one, you can sign up for a free account.
Register for the add-on: make sure you're logged in to your account and then go to the Add-ons page. For more information about add-on registrations, see Registering for add-ons.
Keep in mind that many of the examples on this page use our SDKs. For SDK installation and configuration details, see the relevant SDK guide.
If you're new to Cloudinary, you may want to take a look at the Developer Kickstart for a hands-on, step-by-step introduction to Programmable Media features.
Supported content-aware detection models
The Cloudinary AI Content Analysis add-on supports a number of built-in content-aware detection models, each supporting a specific set of categories and objects. You can specify which version of each model to invoke for each use of the add-on.
Cloudinary currently supports the following models:
Model | Description |
---|---|
coco | The Common Objects in Context model contains just 80 common objects. |
cld-fashion | Cloudinary's fashion model is specifically dedicated to items of clothing. Used with automatic image tagging, the response includes attributes of the clothing identified, for example whether the garment contains pockets, its material and the fastenings used. |
lvis | The Large Vocabulary Instance Segmentation model contains thousands of general objects. |
unidet | The UniDet model is a unified model, combining a number of object models, including Objects365, which focuses on diverse objects in the wild. |
human-anatomy | Cloudinary's human anatomy model identifies parts of the human body in an image. It works best when the majority of a human body is detected in the image. |
cld-text | Cloudinary's text model tells you if your image includes text, and where it's located. Used with automatic image tagging, you can then search for images that contain blocks of text. Used with object-aware cropping, you can choose to keep only the text part, or specify a crop that avoids the text. |
shop-classifier | Cloudinary's shop classifier model detects if the image is a product image taken in a studio, or if it's a natural image. |
image-type | Cloudinary's image type model detects generic properties about a photographic image, for example, photographic style, setting and time of the photo. |
captioning | Cloudinary's captioning model is used to describe the contents of an image. See AI-based image captioning. |
watermark-detection | Cloudinary's watermark detection model identifies if the image contains different types of watermark. See Watermark detection. |
iqa | The Image Quality Analysis (IQA) model can predict the quality of a given image on a scale from 0 to 1 and provides a general quality estimation, categorized as 'low', 'medium', or 'high'. See Image quality analysis. |
Model capabilities
This table shows the capabilities of each supported version of each model:
- Default version is the version of the model that is invoked if left unspecified.
- Version indicates support for a particular version of the model - different versions have different accuracies.
-
Default confidence shows the confidence level used when auto_tagging is set to
default
. - Tag indicates support for returning tags. This is a required capability for automatic image tagging.
- Confidence indicates support for returning confidence levels.
- Bounding Box indicates support for returning bounding boxes. This is a required capability for object-aware cropping.
- Attributes indicates support for returning attributes for each tag in a (key,value) list.
- If you are using our Asia Pacific data center, currently you can use only the COCO and Open Images models.
- If you have difficulty accessing any of the models, please contact support.
Supported objects and categories
Start typing the name of an object or category to see if it's supported by one of the built-in models.
-
- The Full URL Syntax column shows the syntax to use to detect a specific object or category in a particular version of a model (e.g.
coco_v2_tie
). You can also omit the version (e.g.coco_tie
), or both the model and version (e.g.tie
).
- The Full URL Syntax column shows the syntax to use to detect a specific object or category in a particular version of a model (e.g.
-
- You can specify the model and version (e.g.
coco_v2
), or only the model (e.g.coco
).
- You can specify the model and version (e.g.
-
- Specify the object from the cld-fashion model (e.g.
g_track_person:obj_hat
)
- Specify the object from the cld-fashion model (e.g.
Private models
If you have your own content-aware detection models that you would like to use, these can be integrated as private models that work only on your product environment. This service is provided for customers on Enterprise plans through Professional Services. Contact our Enterprise support and sales team or your CSM to find out more.
Object detection demo
This demo lets you choose one of the content-aware detection models, and shows up to twenty objects that are detected by that model in an image of your choice.
Automatic image tagging is requested on upload, and the response provides the necessary information to overlay bounding boxes around the detected objects, together with the confidence level.
- Read this blog to discover all the Cloudinary features in this demo.
Object-aware cropping
When object-aware cropping is invoked, Cloudinary applies advanced AI-based object detection algorithms on the fly during the crop process. You can either use it in conjunction with auto-gravity to give higher priority to the objects you care about, or directly specify that the crop should be exactly based on the detected coordinates of the specified objects.
Watch this demo to see how the same image is cropped according to the parameters specified in the URL:
Applying object-aware cropping
After registering for the Cloudinary AI Content Analysis add-on, you can apply it in one of two ways:
Automatic gravity with a high weighting towards a specified object
This variant of auto-gravity cropping enables you to indicate specific objects or object categories that should be given priority when parts of a photo are cropped out. This is done by specifying an object or an object category as thefocal_gravity
attribute for theauto
gravity parameter (for example,g_auto:cat
in URLs) together with a cropping option. If the specified content is not found in the image, the gravity is determined by the standard auto-gravity algorithm.Object-specific gravity
By specifying an object or object category as the gravity parameter (for example,g_cat
in URLs) together with a cropping option, you can accurately crop around objects without needing to specify dimensions or aspect ratio. If the specified content is not found in the image, the gravity remains at the center of the image.
When specifying an object or category, you can optionally include a specific model (that supports bounding boxes) and version. For example, you can specify:
- Only the object/category, e.g.:
g_auto:cat
org_cat
- The model with the object/category, e.g.:
g_auto:coco_cat
org_coco_cat
- The model and version with object/category, e.g.:
g_auto:coco_v2_cat
org_coco_v2_cat
If you choose not to specify a model, each model that supports bounding boxes is invoked in turn until the specified content is detected. The order in which they are invoked is: coco > cld-fashion > lvis > unidet > human-anatomy > cld-text.
Consider the original image of a kitchen below:
Using auto-gravity, you can deliver a square thumbnail crop that prioritizes the detected coordinates of the sink, microwave, or refrigerator. To do this, specify the relevant object option for the g_auto
gravity definition in conjunction with the thumb
or auto
cropping option:
Using object-specific gravity, you can choose not to give dimensions or aspect ratio, and deliver an image that is tightly cropped to the object. To do this, specify the relevant object option for the gravity definition in conjunction with the crop
cropping option:
You can also specify an aspect ratio together with the crop
cropping option, without including specific dimensions. This keeps the object but may show more of the image to fit the aspect ratio.
In addition to the crop, thumb and auto cropping modes, object aware cropping can also be used with the fill and lfill (limit fill) cropping modes. The fill_pad and auto_pad cropping modes work with the auto-gravity variant of object aware cropping, but not object-specific gravity.
Notes on specifying categories and objects
When applying object-aware cropping, you can specify either individual objects or more general object categories.
- When you specify a category, the algorithm gives priority to any objects that are detected from that category.
- The regular auto-gravity behavior also impacts the cropping decision. But if requested objects are detected, they get significantly higher priority than the subjects or salient areas that the regular auto-gravity algorithm selects.
- If you specify the generic
object
category with auto-gravity (g_auto:object
), then any detected objects from any category get priority. - If there are multiple objects of the same type in the image, object-specific gravity selects the most prominent of the objects, and bases its crop around only that object, whereas auto-gravity may choose to keep more than one of the objects in the crop.
- The categories and objects also work in their plural forms when using object-specific gravity. So, for example,
c_crop,g_birds
keeps all birds in the crop, whereasc_crop,g_bird
keeps only the most prominent bird.
Combining focal gravity options using auto-gravity
When using auto
gravity to determine the area to keep in a crop, you can specify multiple focal_gravity
options.
This means that in a single auto-gravity parameter, you can optionally specify:
- One or multiple objects (from the same or different categories and/or models)
- Built-in focal gravity options such as
face
/faces
orcustom_no_override
- Other add-on based focal gravity options, such as the
adv_face
,adv_eyes
options from the Advanced Facial Attributes Detection add-on - Only the
classic
or only thesubject
auto-gravity algorithm, which in some cases may have some impact on the exact coordinates of the crop, even if other specified objects or focal gravity options are detected. Note that the default algorithm, which combines both of these algorithms, is recommended in the majority of cases.
For example, your auto-gravity URL parameter might be: g_auto:cat:sofa:faces:adv_eyes
This would instruct the cropping mechanism to give top priority to any cats, sofas, faces, or eyes detected in the photo.
For a complete list of all focal_gravity
options, see the g_<special_position> section of the Transformation URL API Reference.
- The focal gravity options can be specified in any order. The order does not impact the result.
- When multiple items are detected that match the requested focal options, larger, more central, and more in-focus (less blurry) objects will get higher priority.
In special cases, it's possible to fine-tune this default prioritization further. For details, contact support. - If a particular image has custom coordinates defined, those coordinates always override all other focal gravity options, unless you use the
custom_no_override
option in conjunction with the other options, in which case the custom coordinates are taken into account when determining the gravity (see Custom coordinates with auto gravity).
Combining focal gravity options using object-specific gravity
When using object-specific gravity to determine the area to keep in a crop, you can specify multiple focal_gravity
options, but unlike auto-gravity, the order in which they are specified has an impact on the delivered image.
For example, consider this photo of a cat and dog:
By setting the gravity
parameter to cat:dog
the cat gets precedence:
Whereas, if you switch the order to dog:cat
the dog gets precedence:
You can also combine the auto
option to invoke the auto-gravity algorithm if none of the specified objects are found. For example:
-
g_dog:cat:auto
- auto-gravity is invoked only if no dogs and cats are detected. -
g_dog:auto:cat
- auto-gravity weighted by cat (g_auto:cat
) is invoked if no dogs are detected.
auto
option then you also need to specify at least one dimension parameter (width or height).For example, consider this photo of a cat and three birds:
As there is no dog in the photo, auto-gravity weighted by bird is invoked when using dog:auto:bird
. In this case, two birds are kept in the crop:
Notice that if auto-gravity is not specified, the object-specific algorithm chooses the most prominent bird out of the three and only keeps this bird in the crop:
Specifying objects to avoid using auto-gravity
In addition to specifying objects to keep in an image, you can specify objects that you would rather not see. To minimize the likelihood of including a particular object in the cropped image, use auto-gravity with the avoid
option for the relevant object or category.
For example, in photos like the one below, you may prefer not to include people because the purpose of the photo is to show an interesting storefront, and the people are a distraction.
Using g_auto
by itself makes the people the focal point, but if we use g_auto:person_avoid
, the other side of the photo is shown, without the people.
Choosing the cropping mode
When you specify an object, either specifically or in your auto-gravity parameter, the Object-Aware Cropping AI algorithm detects the coordinates of the object and those coordinates are used by the cropping mode.
When using thumb cropping (
c_thumb
), the image is cropped as closely as possible to the detected coordinates of the object given the requested aspect ratio, and then scaled to the requested pixel size. Note that if the requested pixel size is greater than the crop, the image is not scaled up, but filled with further pixels from the image.When using crop mode (
c_crop
), the detected coordinates are prioritized as the area to keep when determining how much to cut from each edge of the photo in order to achieve the requested pixel size. If using auto-gravity and the requested pixel size is larger than the coordinates of the detected object, other elements of the image that receive priority fromg_auto
may impact what else is included in the photo and where in your resulting image the detected object may be located, meaning that the detected object will not necessarily be the center of the photo.When using any of the fill-based modes (
c_fill
,c_lfill
,c_fill_pad
), the coordinates of the detected object should be retained if any cropping is required after scaling. If using auto-gravity, other elements of the image that receive priority fromg_auto
may impact what else is included in the photo and where in your resulting image the detected object may be located, meaning that the detected object will not necessarily be the center of the photo.When using the auto cropping mode (
c_auto
), the crop is focused on the object, but also takes into account more of the whole picture, so gives a more 'zoomed out' result than thumb, and crop but more 'zoomed in' than fill. If the requested dimensions are smaller than the best crop, the result is downscaled. If the requested dimensions are larger than the original image, the result is upscaled.
The following examples show how different your cropping results may be for the same requested object in the gravity, but with different cropping modes. In this case, we take the original photo below and apply g_auto:camera
and g_camera
with fill
, crop
, thumb
and auto
cropping modes. In all cases, the same width and aspect ratio are requested (ar_1,w_200
).
g_auto:camera
g_camera
Using object-aware cropping for responsive delivery
You can take advantage of object-aware cropping with various cropping modes to assist in responsive art direction. This means that when you deliver different sized images to different devices, you don't just scale the same image, but rather crop images differently for different sizes, so that the important objects are always highly visible.
For example, you may:
- deliver a full-size image to large HD screens
- use
g_auto:[your_important_object]
, org_[your_important_object]
withfill
cropping for medium sized screens - use
g_auto:[your_important_object]
, org_[your_important_object]
withthumb
orauto
cropping for very small screens.
For more details on delivering responsive images, see the Responsive images guide.
Using objects with the zoompan effect
In addition to cropping, the Cloudinary AI Content Analysis add-on allows you to use objects for start and end points of a zoompan transformation.
The zoompan
effect lets you create a video or animated GIF from an image by zooming and panning from one area of the image to another. Use the from
and/or to
options with objects as gravity and specify a video or animated image format.
The example below is a seven second MP4 video (.mp4
) of a model wearing fashionable items, starting zoomed into the hat (from_(g_hat;zoom_4.5)
), then zooming out and panning to the pants (to_(g_pants;zoom_1.6)
).
Signed URLs
Cloudinary's dynamic image transformation URLs are powerful tools. However, due to the potential costs of your customers experimenting with dynamic URLs that apply the object-aware cropping algorithm, image transformation add-on URLs are required (by default) to be signed using Cloudinary's authenticated API. Alternatively, you can eagerly generate the requested derived images using Cloudinary's authenticated API.
To create a signed delivery URL using SDKs, set the sign_url
parameter to true
when building a URL or creating an image tag.
The following code example applies object-aware cropping to the skater
image, including a signed Cloudinary URL:
The generated Cloudinary URL shown below includes a signature component (/s--acvfjq2y--/
). Only URLs with a valid signature that matches the requested image transformation will be approved for on-the-fly image transformation and delivery.
For more details on signed URLs, see Signed delivery URLs.
Automatic image tagging
The automatic image tagging behavior of the Cloudinary AI Content Analysis add-on can be invoked on uploading an image, or by updating an image that's already stored in your product environment. Using the specified model, it analyzes the image, identifies categories and objects, and suggests tags that could be applied to the image.
Object and category detection
Take a look at the following photo of a woman dressed fashionably for winter:
By setting the detection
parameter to the name of the model (and optionally the version, e.g. cld-fashion_v3
) you want to invoke when calling Cloudinary's upload or update methods, the add-on automatically analyzes the content of the uploaded or specified existing image. For example, invoking the cld-fashion
detection model while uploading winter_fashion.jpg
:
You can use upload presets to centrally define a set of upload options including add-on operations to apply, instead of specifying them in each upload call. You can define multiple upload presets, and apply different presets in different upload scenarios. You can create new upload presets in the Upload Presets page of the Console Settings or using the upload_presets Admin API method. From the Upload page of the Console Settings, you can also select default upload presets to use for image, video, and raw API uploads (respectively) as well as default presets for image, video, and raw uploads performed via the Media Library UI.
Learn more: Upload presets
The upload API response includes the categories and objects automatically identified by the model you requested. As can be seen in the response snippet below, a hat and a specific type of outerwear are automatically detected in the uploaded photo. Depending on the capabilities of each model, different information is returned. In the example below, a confidence score, bounding box and in some cases, attributes, are returned for each detected object. The confidence score is a numerical value representing the certainty of a correct detection, where 1.0 means 100% confidence. The bounding-box parameter shows the location of the object in the image, as an array: [x-coordinate of top left corner
, y-coordinate of top left corner
, width of box
, height of box
]. Bounding-box information is used in the object detection demo.
Adding tags to images
By providing the auto_tagging
parameter to an upload
or update
request, images are automatically assigned tags based on the detected content. The value of the auto_tagging
parameter is the minimum confidence score of a detected category or object that should be automatically used as an assigned tag. You can also set auto_tagging
to default
, which uses the model's default confidence.
The following code example automatically tags an uploaded image with all detected categories that have a confidence score higher than 0.6.
The response to the upload request returns the detected categories as well as the assigned tags for categories meeting the minimum confidence score of 0.6:
You can also use the update
method to apply auto tagging to images already stored in your product environment.
The following example uses Cloudinary's update
method on the puppy
image in the product environment, to detect objects and categories in the LVIS model. Tags are automatically assigned based on the objects and categories detected with over a 90% confidence level.
You can use the Admin API's resource_by_tag method to return all resources with a certain tag, for example hat
:
You can also use the search method or the Media Explorer advanced search to find images with certain tags.
Asynchronous handling
As automatic image tagging may not be immediate, it is good practice to use asynchronous handling for these calls.
To make the call asynchronous, set the async
parameter of the upload
method to true
. To be notified when the processing is complete, you can either set the notification_url
parameter of the upload
method (as in the example below) or the global webhook Notification URL in the Upload page of your Cloudinary Console Settings.
The response to an asynchronous upload call looks similar to this:
When the processing is finished, the complete upload response is sent to the notification URL that you specified.
Image quality analysis
You can analyze the quality of an image using the Image Quality Analysis (IQA) model by setting the detection
parameter to iqa
when calling Cloudinary's upload or update methods.
A quality score from 0 to 1 is returned in the score
attribute of the response, and a general quality estimation of low
, medium
, or high
is returned in the quality
attribute.
For example, invoking the iqa
model while uploading winter_fashion.jpg
:
The response includes the iqa-analysis
field:
- You can also use asynchronous handling as described for automatic image tagging.
- Learn about other ways to perform image quality analysis.
Watermark detection
You can detect watermarks in images by setting the detection
parameter to watermark-detection
when calling Cloudinary's upload or update methods. The response can be one of the following: banner
(see banners), watermark
(see watermarks), or, if neither of these are detected, clean
.
banner
, watermark
or clean
by setting the auto_tagging
parameter, as described in Adding tags to images and you can also use asynchronous handling.Banners
If the image contains an opaque text/logo layer with a semi-transparent background it is likely that the image will be flagged as containing a banner.
For example, uploading the following image, requesting watermark-detection
, the response shows 99% confidence that it contains a banner:
Upload request:
The response includes:
Watermarks
If the image contains a semi-transparent layer, it is likely that the image will be flagged as containing a watermark.
For example, uploading the following image, requesting watermark-detection
, the response shows 99% confidence that it contains a watermark:
Upload request:
The response includes:
AI-based image captioning
The Cloudinary AI Content Analysis add-on can be used to analyze an image and suggest a caption based on the image's contents.
Some example captions suggested by the AI:
- A brown dog standing on top of a street next to a sidewalk with a building in the background
- A group of young children playing soccer on a soccer field with a goal post in the foreground and a goal post in the background
- A hand reaching for a donut with chocolate and sprinkles on it on a dark surface
By setting the detection
parameter to captioning
when calling Cloudinary's upload or update methods, the add-on automatically analyzes the content of the image. For example, invoking the captioning
detection model while uploading toy_room.jpg
:
You can use upload presets to centrally define a set of upload options including add-on operations to apply, instead of specifying them in each upload call. You can define multiple upload presets, and apply different presets in different upload scenarios. You can create new upload presets in the Upload Presets page of the Console Settings or using the upload_presets Admin API method. From the Upload page of the Console Settings, you can also select default upload presets to use for image, video, and raw API uploads (respectively) as well as default presets for image, video, and raw uploads performed via the Media Library UI.
Learn more: Upload presets
The upload API response includes the captioning information:
- You can retrieve the caption text value from the response and then use the
update
method of the Admin API to add the caption text to the metadata of images stored in your product environment, such as the contextual metadata (context
) or a structured metadata field (metadata
). - After you've requested a caption using the
upload
orupdate
method, you can use the Admin API get details of a single resource method to return details of the image, including the stored caption value. - You can also request analysis using the Analyze API (Beta) which also accepts external assets to analyze.
Asynchronous handling
As the response may not be immediate, it is good practice to use asynchronous handling for these calls.
To make the call asynchronous, set the async
parameter of the upload
method to true
. To be notified when the processing is complete, you can either set the notification_url
parameter of the upload
method (as in the example below) or the global webhook Notification URL in the Upload page of your Cloudinary Console Settings.
The response to an asynchronous upload call looks similar to this:
When the processing is finished, the complete upload response is sent to the notification URL that you specified.