This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

CVAT Python SDK

Overview

CVAT SDK is a Python library. It provides you access to Python functions and objects that simplify server interaction and provide additional functionality like data validation and serialization.

SDK API includes several layers:

  • Low-level API with REST API wrappers. Located at cvat_sdk.api_client. Read more
  • High-level API. Located at cvat_sdk.core. Read more
  • PyTorch adapter. Located at cvat_sdk.pytorch. Read more
  • Auto-annotation API. Located at cvat_sdk.auto_annotation. Read more

In general, the low-level API provides single-request operations, while the high-level one implements composite, multi-request operations, and provides local proxies for server objects. For most uses, the high-level API should be good enough, and it should be the right point to start your integration with CVAT.

The PyTorch adapter is a specialized layer that represents datasets stored in CVAT as PyTorch Dataset objects. This enables direct use of such datasets in PyTorch-based machine learning pipelines.

The auto-annotation API is a specialized layer that lets you automatically annotate CVAT datasets by running a custom function on the local machine. See also the auto-annotate command in the CLI.

Installation

To install an official release of CVAT SDK use this command:

pip install cvat-sdk

To use the PyTorch adapter, request the pytorch extra:

pip install "cvat-sdk[pytorch]"

We support Python versions 3.8 and higher.

Usage

To import package components, use the following code:

For the high-level API:

import cvat_sdk
# or
import cvat_sdk.core

For the low-level API:

import cvat_sdk.api_client

For the PyTorch adapter:

import cvat_sdk.pytorch

1 - SDK API Reference

1.1 - Models

2 - Low-level API

Overview

The low-level API is useful if you need to work directly with REST API, but want to have data validation and syntax assistance from your code editor. The code on this layer is autogenerated.

Code of this component is located in cvat_sdk.api_client.

Example

Let’s see how a task with local files can be created. We will use the basic auth to make things simpler.

from time import sleep
from cvat_sdk.api_client import Configuration, ApiClient, models, apis, exceptions

configuration = Configuration(
    host="http://localhost",
    username='YOUR_USERNAME',
    password='YOUR_PASSWORD',
)

# Enter a context with an instance of the API client
with ApiClient(configuration) as api_client:
    # Parameters can be passed as a plain dict with JSON-serialized data
    # or as model objects (from cvat_sdk.api_client.models), including
    # mixed variants.
    #
    # In case of dicts, keys must be the same as members of models.I<ModelName>
    # interfaces and values must be convertible to the corresponding member
    # value types (e.g. a date or string enum value can be parsed from a string).
    #
    # In case of model objects, data must be of the corresponding
    # models.<ModelName> types.
    #
    # Let's use a dict here. It should look like models.ITaskWriteRequest
    task_spec = {
        'name': 'example task',
        "labels": [{
            "name": "car",
            "color": "#ff00ff",
            "attributes": [
                {
                    "name": "a",
                    "mutable": True,
                    "input_type": "number",
                    "default_value": "5",
                    "values": ["4", "5", "6"]
                }
            ]
        }],
    }

    try:
        # Apis can be accessed as ApiClient class members
        # We use different models for input and output data. For input data,
        # models are typically called like "*Request". Output data models have
        # no suffix.
        (task, response) = api_client.tasks_api.create(task_spec)
    except exceptions.ApiException as e:
        # We can catch the basic exception type, or a derived type
        print("Exception when trying to create a task: %s\n" % e)

    # Here we will use models instead of a dict
    task_data = models.DataRequest(
        image_quality=75,
        client_files=[
            open('image1.jpg', 'rb'),
            open('image2.jpg', 'rb'),
        ],
    )

    # If we pass binary file objects, we need to specify content type.
    # For this endpoint, we don't have response data
    (_, response) = api_client.tasks_api.create_data(task.id,
        data_request=task_data,
        _content_type="multipart/form-data",

        # we can choose to check the response status manually
        # and disable the response data parsing
        _check_status=False, _parse_response=False
    )
    assert response.status == 202, response.msg

    # Wait till task data is processed
    for _ in range(100):
        (status, _) = api_client.tasks_api.retrieve_status(task.id)
        if status.state.value in ['Finished', 'Failed']:
            break
        sleep(0.1)
    assert status.state.value == 'Finished', status.message

    # Update the task object and check the task size
    (task, _) = api_client.tasks_api.retrieve(task.id)
    assert task.size == 4

ApiClient and configuration

The starting point in the low-level API is the cvat_sdk.api_client.ApiClient class. It encapsulates session and connection logic, manages headers and cookies, and provides access to various APIs.

To create an instance of ApiClient, you need to set up a cvat_sdk.api_client.Configuration object and pass it to the ApiClient class constructor. Additional connection-specific options, such as extra headers and cookies can be specified in the class constructor. ApiClient implements the context manager protocol. Typically, you create ApiClient this way:

from cvat_sdk.api_client import ApiClient, Configuration

configuration = Configuration(host="http://localhost")
with ApiClient(configuration) as api_client:
    ...

After creating an ApiClient instance, you can send requests to various server endpoints via *_api member properties and directly, using the rest_client member. Read more about API wrappers below.

Typically, the first thing you do with ApiClient is log in. Read more about authentication options below.

Authentication

CVAT supports 2 authentication options:

  • basic auth, with your username and password
  • token auth, with your API key

Token auth requires a token, which can be obtained after performing the basic auth.

The low-level API supports 2 ways of authentication. You can specify authentication parameters in the Configuration object:

configuration = Configuration(
    username='YOUR_USERNAME',
    password='YOUR_PASSWORD',
)
configuration = Configuration(
    api_key={
        "sessionAuth": "<sessionid cookie value>",
        "csrfAuth": "<csrftoken cookie value>",
        "tokenAuth": "Token <auth key value>",
    }
)

You can perform a regular login using the auth_api member of ApiClient and set the Authorization header using the Token prefix. This way, you’ll be able to obtain API tokens, which can be reused in the future to avoid typing your credentials.

from cvat_sdk.api_client import models

(auth, _) = api_client.auth_api.create_login(
    models.LoginRequest(username=credentials[0], password=credentials[1])
)

assert "sessionid" in api_client.cookies
assert "csrftoken" in api_client.cookies
api_client.set_default_header("Authorization", "Token " + auth.key)

API wrappers

API endpoints are grouped by tags into separate classes in the cvat_sdk.api_client.apis package.

APIs can be accessed as ApiClient object members:

api_client.auth_api.<operation>(...)
api_client.tasks_api.<operation>(...)

And APIs can be instantiated directly like this:

from cvat_sdk.api_client import ApiClient, apis

api_client = ApiClient(...)

auth_api = apis.AuthApi(api_client)
auth_api.<operation>(...)

tasks_api = apis.TasksApi(api_client)
tasks_api.<operation>(...)

For each operation, the API wrapper class has a corresponding <operation>_endpoint member. This member represents the endpoint as a first-class object, which provides metainformation about the endpoint, such as the relative URL of the endpoint, parameter names, types and their placement in the request. It also allows to pass the operation to other functions and invoke it from there.

For a typical server entity like Task, Project, Job etc., the *Api classes provide methods that reflect Create-Read-Update-Delete (CRUD) operations: create, retrieve, list, update, partial_update, delete. The set of available operations depends on the entity type.

You can find the list of the available APIs and their documentation here.

Models

Requests and responses can include data. It can be represented as plain Python data structures and model classes (or models). In CVAT API, model for requests and responses are separated: the request models have the Request suffix in the name, while the response models have no suffix. Models can be found in the cvat_sdk.api_client.models package.

Models can be instantiated like this:

from cvat_sdk.api_client import models

user_model = models.User(...)

Model parameters can be passed as models, or as plain Python data structures. This rule applies recursively, starting from the method parameters. In particular, this means you can pass a dict into a method or into a model constructor, and corresponding fields will be parsed from this data automatically:

task_spec = models.TaskWriteRequest(
    name='example task',
    labels=[
        models.PatchedLabelRequest(
            name="car",
            color="#ff00ff",
            attributes=[
                model.AttributeRequest(
                    name="a",
                    mutable=True,
                    input_type="number",
                    default_value="5",
                    values=["4", "5", "6"]
                )
            ]
        )
    ],
)
api_client.tasks_api.create(task_spec)

Is equivalent to:

api_client.tasks_api.create({
    'name': 'example task',
    "labels": [{
        "name": "car",
        "color": "#ff00ff",
        "attributes": [
            {
                "name": "a",
                "mutable": True,
                "input_type": "number",
                "default_value": "5",
                "values": ["4", "5", "6"]
            }
        ]
    }],
})

You can mix these variants.

Most models provide corresponding interface classes called like I<model name>. They can be used to implement your own classes or describe APIs. They just provide type annotations and descriptions for model fields.

You can export model values to plain Python dicts using the as_dict() method and the cvat_sdk.api_client.model_utils.to_json() function.

You can find the list of the available models and their documentation here.

Sending requests

To send a request to a server endpoint, you need to obtain an instance of the corresponding *Api class. You can find summary about available API classes and supported endpoints here. The *Api instance object allows to send requests to the relevant server endpoints.

By default, all operations return 2 objects: the parsed response data and the response itself.

The first returned value is a model parsed from the response data. If a method does not have any return value, None is always returned as the first value. You can control automatic parsing using the _parse_response method kwarg. When disabled, None is returned.

The second value is the raw response, which can be useful to get response parameters, such as status code, headers, or raw response data. By default, the status code of the response is checked to be positive. In the case of request failure, an exception is raised by default. This behavior can be controlled by the _check_status method kwarg. If the status is not checked, you will need to manually check the response status code and perform actions needed.

A typical endpoint call looks like this:

from cvat_sdk.api_client import ApiClient, apis

with ApiClient(...) as api_client:
    ...
    (data, response) = api_client.tasks_api.list()
    # process the response ...

Operation parameters can be passed as positional or keyword arguments. API methods provide extra common arguments which control invocation logic:

  • _parse_response (bool) - Allows to enable and disable response data parsing. When enabled, the response data is parsed into a model or a basic type and returned as the first value. When disabled, the response is not parsed, and None is returned. Can be useful, for instance, if you need to parse data manually, or if you expect an error in the response. Default is True.
  • _check_status (bool) - Allows to enable or disable response status checks. When enabled, the response status code is checked to be positive as defined in the HTTP standards. In the case of negative status, an exception is raised. Default is True.
  • _validate_inputs (bool): specifies if type checking should be done on the data sent to the server. Default is True.
  • _validate_outputs (bool): specifies if type checking should be done on the data received from the server. Default is True.
  • _request_timeout (None | int | float | Tuple[int | float, int | float]) - Allows to control timeouts. If one number is provided, it will be the total request timeout. It can also be a tuple with (connection, read) timeouts. Default is None, which means no timeout.
  • _content_type (None | str) - Allows to specify the Content-Type header value for the request. Endpoints can support different content types and behave differently depending on the value. For file uploads _content_type="multipart/form-data" must be specified. Read more about file uploads here. Default is application/json.

NOTE: the API is autogenerated. In some cases the server API schema may be incomplete or underspecified. Please report to us all the problems found. A typical problem is that a response data can’t be parsed automatically due to the incorrect schema. In this case, the simplest workaround is to disable response parsing using the _parse_response=False method argument.

You can find many examples of API client usage in REST API tests here.

Organizations

To create resource in the context of an organization, use one of these method arguments:

  • org - The unique organization slug
  • org_id- The organization id
...
(task, response) = api_client.tasks_api.create(task_spec, org_id=org_id)

Paginated responses

There are several endpoints that allow to request multiple server entities. Typically, these endpoints are called list_.... When there are lots of data, the responses can be paginated to reduce server load. If an endpoint returns paginated data, a single page is returned per request. In some cases all entries need to be retrieved. CVAT doesn’t provide specific API or parameters for this, so the solution is to write a loop to collect and join data from multiple requests. SDK provides an utility function for this at cvat_sdk.core.helpers.get_paginated_collection().

Example:

from cvat_sdk.core.helpers import get_paginated_collection

...
project_tasks = get_paginated_collection(
    api_client.projects_api.list_tasks_endpoint,
    id=project_id,
)

Binary data in requests and responses

At the moment, sending and receiving binary data - such as files - can be difficult via the low-level SDK API. Please use the following recommendations.

Sending data

By default, requests use the application/json content type, which is a text type. However, it’s inefficient to send binary data in this encoding, and the data passed won’t be converted automatically. If you need to send files or other binary data, please specify _content_type="multipart/form-data" in the request parameters:

Example:

(_, response) = api_client.tasks_api.create_data(
    id=42,
    data_request=models.DataRequest(
        client_files=[
            open("image.jpg", 'rb')
        ],
        image_quality=70,
    ),
    _content_type="multipart/form-data", # required
)

Please also note that if there are complex fields in the data (such as nested lists or dicts), they, in turn, cannot be encoded as multipart/form-data, so the recommended solution is to split fields into files and others, and send them in different requests with different content types:

Example:

data = {
    'client_files': [...], # a list of binary files
    'image_quality': ..., # a simple type - int
    'job_file_mapping': [...], # a complex type - list
}

# Initialize uploading
api_client.tasks_api.create_data(
    id=42,
    data_request=models.DataRequest(image_quality=data["image_quality"]),
    upload_start=True,
)

# Upload binary data
api_client.tasks_api.create_data(
    id=42,
    data_request=models.DataRequest(
        client_files=data.pop("client_files"),
        image_quality=data["image_quality"],
    ),
    upload_multiple=True,
    _content_type="multipart/form-data",
)

# Finalize the uploading and send the remaining fields
api_client.tasks_api.create_data(
    id=42,
    data_request=models.DataRequest(**data),
    upload_finish=True,
)

Receiving data

Receiving binary files can also be difficult with the low-level API. To avoid unexpected behavior, it is recommended to specify _parse_response=False in the request parameters. In this case, SDK will not try to parse models from responses, and the response data can be fetched directly from the response:

from time import sleep

# Export a task as a dataset
while True:
    (_, response) = api_client.tasks_api.retrieve_dataset(
        id=42,
        format='COCO 1.0',
        _parse_response=False,
    )
    if response.status == HTTPStatus.CREATED:
        break

    sleep(interval)

(_, response) = api_client.tasks_api.retrieve_dataset(
    id=42,
    format='COCO 1.0',
    action="download",
    _parse_response=False,
)

# Save the resulting file
with open('output_file', 'wb') as output_file:
    output_file.write(response.data)

Different versions of API endpoints

The cloudstorages/id/content REST API endpoint

Warning: The retrieve_content method of cloudstorages_api will be deprecated in 2.5.0 version. We recommend using retrieve_content_v2 method that matches to revised API when using SDK. For backward compatibility, we continue to support the prior interface version until version 2.6.0 is released.

Here you can find the example how to get the bucket content using new method retrieve_content_v2.

from pprint import pprint

from cvat_sdk.api_client import ApiClient, Configuration

next_token = None
files, prefixes = [], []
prefix = ""

with ApiClient(
    configuration=Configuration(host=BASE_URL, username=user, password=password)
) as api_client:
    while True:
        data, response = api_client.cloudstorages_api.retrieve_content_v2(
            cloud_storage_id,
            **({"prefix": prefix} if prefix else {}),
            **({"next_token": next_token} if next_token else {}),
        )
        # the data will have the following structure:
        # {'content': [
        #     {'mime_type': <image|video|archive|pdf|DIR>, 'name': <name>, 'type': <REG|DIR>},
        # ],
        # 'next': <next_token_string|None>}
        files.extend(
            [
                prefix + f["name"]
                for f in data["content"]
                if str(f["type"]) == "REG"
            ]
        )
        prefixes.extend(
            [
                prefix + f["name"]
                for f in data["content"]
                if str(f["type"]) == "DIR"
            ]
        )
        next_token = data["next"]
        if next_token:
            continue
        if not len(prefixes):
            break
        prefix = f"{prefixes.pop()}/"
    pprint(files) # ['sub/image_1.jpg', 'image_2.jpg']

3 - High-level API

Overview

This layer provides high-level APIs, allowing easier access to server operations. API includes Repositories and Entities. Repositories provide management operations for Entities. Entities represent objects on the server (e.g. projects, tasks, jobs etc) and simplify interaction with them. The key difference from the low-level API is that operations on this layer are not limited by a single server request per operation and encapsulate low-level request machinery behind a high-level object-oriented API.

The code of this component is located in the cvat_sdk.core package.

Example

from cvat_sdk import make_client, models
from cvat_sdk.core.proxies.tasks import ResourceType, Task

# Create a Client instance bound to a local server and authenticate using basic auth
with make_client(host="localhost", credentials=('user', 'password')) as client:
    # Let's create a new task.

    # Fill in task parameters first.
    # Models are used the same way as in the layer 1.
    task_spec = {
        "name": "example task",
        "labels": [
            {
                "name": "car",
                "color": "#ff00ff",
                "attributes": [
                    {
                        "name": "a",
                        "mutable": True,
                        "input_type": "number",
                        "default_value": "5",
                        "values": ["4", "5", "6"],
                    }
                ],
            }
        ],
    }

    # Now we can create a task using a task repository method.
    # Repositories can be accessed as the Client class members.
    # In this case we use 2 local images as the task data.
    task = client.tasks.create_from_data(
        spec=task_spec,
        resource_type=ResourceType.LOCAL,
        resources=['image1.jpg', 'image2.png'],
    )

    # The returned task object is already up-to-date with its server counterpart.
    # Now we can access task fields. The fields are read-only and can be optional.
    # Let's check that we have 2 images in the task data.
    assert task.size == 2

    # If an object is modified on the server, the local object is not updated automatically.
    # To reflect the latest changes, the local object needs to be fetch()-ed.
    task.fetch()

    # Let's obtain another task. Again, it can be done via the task repository.
    # Suppose we have already created the task earlier and know the task id.
    task2 = client.tasks.retrieve(42)

    # The task object fields can be update()-d. Note that the set of fields that can be
    # modified can be different from what is available for reading.
    task2.update({'name': 'my task'})

    # And the task can also be remove()-d from the server. The local copy will remain
    # untouched.
    task2.remove()

Client

The cvat_sdk.core.client.Client class provides session management, implements authentication operations and simplifies access to server APIs. It is the starting point for using CVAT SDK.

A Client instance allows you to:

  • configure connection options with the Config class
  • check server API compatibility with the current SDK version
  • deduce server connection scheme (https or http) automatically
  • manage user session with the login(), logout() and other methods
  • obtain Repository objects with the users, tasks, jobs and other members
  • reach to lower-level APIs with the corresponding members

An instance of Client can be created directly by calling the class constructor or with the utility function cvat_sdk.core.client.make_client() which can handle some configuration for you. A Client can be configured with the cvat_sdk.core.client.Config class instance. A Config object can be passed to the Client constructor and then it will be available in the Client.config field.

The Client class implements the context manager protocol. When the context is closed, the session is finished, and the user is logged out automatically. Otherwise, these actions can be done with the close() and logout() methods.

You can create and start using a Client instance this way:

from cvat_sdk import make_client

with make_client('localhost', port='8080', credentials=('user', 'password')) as client:
    ...

The make_client() function handles configuration and object creation for you. It also allows to authenticate right after the object is created.

If you need to configure Client parameters, you can do this:

from cvat_sdk import Config, Client

config = Config()
# set up some config fields ...

with Client('localhost:8080', config=config) as client:
    client.login(('user', 'password'))
    ...

You can specify server address both with and without the scheme. If the scheme is omitted, it will be deduced automatically.

The checks are performed in the following order: https (with the default port 8080), http (with the default port 80). In some cases it may lead to incorrect results - e.g. you have 2 servers running on the same host at default ports. In such cases just specify the schema manually: https://localhost.

When the server is located, its version is checked. If an unsupported version is found, an error can be raised or suppressed (controlled by config.allow_unsupported_server). If the error is suppressed, some SDK functions may not work as expected with this server. By default, a warning is raised and the error is suppressed.

Users and organizations

All Client operations rely on the server API and depend on the current user rights. This affects the set of available APIs, objects and actions. For example, a regular user can only see and modify their tasks and jobs, while an admin user can see all the tasks etc.

Operations are also affected by the current organization context, which can be set with the organization_slug property of Client instances. The organization context affects which entities are visible, and where new entities are created.

Set organization_slug to an organization’s slug (short name) to make subsequent operations work in the context of that organization:

client.organization_slug = 'myorg'

# create a task in the organization
task = client.tasks.create_from_data(...)

You can also set organization_slug to an empty string to work in the context of the user’s personal workspace. By default, it is set to None, which means that both personal and organizational entities are visible, while new entities are created in the personal workspace.

To temporarily set the organization slug, use the organization_context function:

with client.organization_context('myorg'):
    task = client.tasks.create_from_data(...)

# the slug is now reset to its previous value

Entities and Repositories

Entities represent objects on the server. They provide read access to object fields and implement additional relevant operations, including both the general Read-Update-Delete and object-specific ones. The set of available general operations depends on the object type.

Repositories provide management operations for corresponding Entities. You don’t need to create Repository objects manually. To obtain a Repository object, use the corresponding Client instance member:

client.projects
client.tasks
client.jobs
client.users
...

An Entity can be created on the server with the corresponding Repository method create():

task = client.tasks.create(<task config>)

We can retrieve server objects using the retrieve() and list() methods of the Repository:

job = client.jobs.retrieve(<job id>)
tasks = client.tasks.list()

After calling these functions, we obtain local objects representing their server counterparts.

Object fields can be updated with the update() method. Note that the set of fields that can be modified can be different from what is available for reading.

job.update({'stage': 'validation'})

The server object will be updated and the local object will reflect the latest object state after calling this operation.

Note that local objects may fall out of sync with their server counterparts for different reasons. If you need to update the local object with the latest server state, use the fetch() method:

# obtain 2 local copies of the same job
job_ref1 = client.jobs.retrieve(1)
job_ref2 = client.jobs.retrieve(1)

# update the server object with the first reference
job_ref1.update(...)
# job_ref2 is outdated now

job_ref2.fetch()
# job_ref2 is synced

Finally, if you need to remove the object from the server, you can use the remove() method. The server object will be removed, but the local copy of the object will remain untouched.

task = client.tasks.retrieve(<task id>)
task.remove()

Repositories can also provide group operations over entities. For instance, you can retrieve all available objects using the list() Repository method. The list of available Entity and Repository operations depends on the object type.

You can learn more about entity members and how model parameters are passed to functions here.

The implementation for these components is located in cvat_sdk.core.proxies.

4 - PyTorch adapter

Overview

This layer provides functionality that enables you to treat CVAT projects and tasks as PyTorch datasets.

The code of this layer is located in the cvat_sdk.pytorch package. To use it, you must install the cvat_sdk distribution with the pytorch extra.

Example

import torch
import torchvision.models

from cvat_sdk import make_client
from cvat_sdk.pytorch import ProjectVisionDataset, ExtractSingleLabelIndex

# create a PyTorch model
model = torchvision.models.resnet34(
    weights=torchvision.models.ResNet34_Weights.IMAGENET1K_V1)
model.eval()

# log into the CVAT server
with make_client(host="localhost", credentials=('user', 'password')) as client:
    # get the dataset comprising all tasks for the Validation subset of project 12345
    dataset = ProjectVisionDataset(client, project_id=12345,
        include_subsets=['Validation'],
        # use transforms that fit our neural network
        transform=torchvision.models.ResNet34_Weights.IMAGENET1K_V1.transforms(),
        target_transform=ExtractSingleLabelIndex())

    # print the number of images in the dataset (in other words, the number of frames
    # in the included tasks)
    print(len(dataset))

    # get a sample from the dataset
    image, target = dataset[0]

    # evaluate the network on the sample and compare the output to the target
    output = model(image)
    if torch.equal(output, target):
        print("correct prediction")
    else:
        print("incorrect prediction")

Datasets

The key components of this layer are the dataset classes, ProjectVisionDataset and TaskVisionDataset, representing data & annotations contained in a CVAT project or task, respectively. Both of them are subclasses of the torch.utils.data.Dataset abstract class.

The interface of Dataset is essentially that of a sequence whose elements are samples from the dataset. In the case of TaskVisionDataset, each sample represents a frame from the task and its associated annotations. The order of the samples is the same as the order of frames in the task. Deleted frames are omitted.

In the case of ProjectVisionDataset, each sample is a sample from one of the project’s tasks, as if obtained from a TaskVisionDataset instance created for that task. The full sequence of samples is built by concatenating the sequences of samples from all included tasks in an unspecified order that is guaranteed to be consistent between executions. For details on what tasks are included, see Task filtering.

Construction

Both dataset classes are instantiated by passing in an instance of cvat_sdk.Client and the ID of the project or task:

dataset = ProjectVisionDataset(client, 123)
dataset = TaskVisionDataset(client, 456)

The referenced project or task must contain image data. Video data is currently not supported.

The constructors of these classes also support several keyword-only parameters:

During construction, the dataset objects either populate or validate the local data cache (see Caching for details). Any necessary requests to the CVAT server are performed at this time. After construction, the objects make no more network requests.

Sample format

Indexing a dataset produces a sample. A sample has the form of a tuple with the following components:

  • sample[0] (PIL.Image.Image): the image.
  • sample[1] (cvat_sdk.pytorch.Target): the annotations and auxiliary data.

The target object contains the following attributes:

  • target.annotations.tags (list[cvat_sdk.models.LabeledImage]): tag annotations associated with the current frame.
  • target.annotations.shapes (list[cvat_sdk.models.LabeledShape]): shape annotations associated with the current frame.
  • target.label_id_to_index (Mapping[int, int]): see Label index assignment.

Note that track annotations are currently inaccessible.

Transform support

The dataset classes support torchvision-like transforms that you can supply to preprocess each sample before it’s returned. You can use this to convert the samples to a more convenient format or to preprocess the data. The transforms are supplied via the following constructor parameters:

  • transforms: a callable that accepts two arguments (the image and the target) and returns a tuple with two elements.
  • transform: a callable that accepts an image.
  • target_transform: a callable that accepts a target.

Let the sample value prior to any transformations be (image, target). Here is what indexing the dataset will return for various combinations of supplied transforms:

  • transforms: transforms(image, target).
  • transform: (transform(image), target).
  • target_transform: (image, target_transform(target)).
  • transform and target_transform: (transform(image), target_transform(target)).

transforms cannot be supplied at the same time as either transform or target_transform.

The cvat_sdk.pytorch module contains some target transform classes that are intended for common use cases. See Transforms.

Label index assignment

The annotation model classes (LabeledImage and LabeledShape) reference labels by their IDs on the CVAT server. This is usually not very useful for machine learning code, since those IDs are unpredictable and will be different between different projects, even if semantically the set of labels is the same.

Therefore, the dataset classes assign to each label a unique index that is intended to be a project-independent identifier. These indices are accessible via the label_id_to_index attribute on each sample’s target. This attribute maps IDs on the server to the assigned index. The mapping is the same for every sample.

By default, the dataset classes arrange all label IDs in an unspecified order that remains consistent across executions, and assign them sequential indices, starting with 0.

You can override this behavior and specify your own label indices with the label_name_to_index constructor parameter. This parameter accepts a mapping from label name to index. The mapping must contain a key for each label in the project/task. When this parameter is specified, label indices are assigned by looking up each label’s name in the provided mapping and using the result.

Task filtering

Note: this section applies only to ProjectVisionDataset.

By default, a ProjectVisionDataset includes samples from every task belonging to the project. You can change this using the following constructor parameters:

  • task_filter (Callable[[models.ITaskRead], bool]): if set, the callable will be called for every task, with an instance of ITaskRead corresponding to that task passed as the argument. Only tasks for which True is returned will be included.

  • include_subsets (Container[str]): if set, only tasks whose subset is a member of the container will be included.

Both parameters can be set, in which case tasks must fulfull both criteria to be included.

Caching

The images and annotations of a dataset can be substantial in size, so they are not downloaded from the server every time a dataset object is created. Instead, they are loaded from a cache on the local file system, which is maintained during dataset object construction according to the policy set by the update_policy constructor parameter.

The available policies are:

  • UpdatePolicy.IF_MISSING_OR_STALE: If some data is already cached, query the server to determine if it is out of date. If so, discard it. Then, download all necessary data that is missing from the cache and cache it.

    This is the default policy.

  • UpdatePolicy.NEVER: If some necessary data is missing from the cache, raise an exception. Don’t make any network requests.

    Note that this policy permits the use of stale data.

By default, the cache is located in a platform-specific per-user directory. You can change this location with the cache_dir setting in the Client configuration.

Transforms

The layer provides some classes whose instances are callables suitable for usage with the target_transform dataset constructor parameter that are intended to simplify working with CVAT datasets in common scenarios.

ExtractBoundingBoxes

Intended for object detection tasks.

Constructor parameters:

  • include_shape_types (Iterable[str]). The values must be from the following list:

    • "ellipse"
    • "points"
    • "polygon"
    • "polyline"
    • "rectangle"

Effect: Gathers all shape annotations from the input target object whose types are contained in the value of include_shape_types. Then returns a dictionary with the following string keys (where N is the number of gathered shapes):

  • "boxes" (a floating-point tensor of shape Nx4). Each row represents the bounding box the corresponding shape in the following format: [x_min, y_min, x_max, y_max].

  • "labels" (an integer tensor of shape N). Each element is the index of the label of the corresponding shape.

Example:

ExtractBoundingBoxes(include_shape_types=['rectangle', 'ellipse'])

ExtractSingleLabelIndex

Intended for image classification tasks.

Constructor parameters: None.

Effect: If the input target object contains no tag annotations or more than one tag annotation, raises ValueError. Otherwise, returns the index of the label in the solitary tag annotation as a zero-dimensional tensor.

Example:

ExtractSingleLabelIndex()

5 - Auto-annotation API

Overview

This layer provides functionality that allows you to automatically annotate a CVAT dataset by running a custom function on your local machine. A function, in this context, is a Python object that implements a particular protocol defined by this layer. To avoid confusion with Python functions, auto-annotation functions will be referred to as “AA functions” in the following text. A typical AA function will be based on a machine learning model and consist of the following basic elements:

  • Code to load the ML model.

  • A specification describing the annotations that the AA function can produce.

  • Code to convert data from CVAT to a format the ML model can understand.

  • Code to run the ML model.

  • Code to convert resulting annotations to a format CVAT can understand.

The layer can be divided into several parts:

  • The interface, containing the protocol that an AA function must implement.

  • The driver, containing functionality to annotate a CVAT dataset using an AA function.

  • The predefined AA function based on Ultralytics YOLOv8n.

The auto-annotate CLI command provides a way to use an AA function from the command line rather than from a Python program. See the CLI documentation for details.

Example

from typing import List
import PIL.Image

import torchvision.models

from cvat_sdk import make_client
import cvat_sdk.models as models
import cvat_sdk.auto_annotation as cvataa

class TorchvisionDetectionFunction:
    def __init__(self, model_name: str, weights_name: str, **kwargs) -> None:
        # load the ML model
        weights_enum = torchvision.models.get_model_weights(model_name)
        self._weights = weights_enum[weights_name]
        self._transforms = self._weights.transforms()
        self._model = torchvision.models.get_model(model_name, weights=self._weights, **kwargs)
        self._model.eval()

    @property
    def spec(self) -> cvataa.DetectionFunctionSpec:
        # describe the annotations
        return cvataa.DetectionFunctionSpec(
            labels=[
                cvataa.label_spec(cat, i)
                for i, cat in enumerate(self._weights.meta['categories'])
            ]
        )

    def detect(self, context, image: PIL.Image.Image) -> List[models.LabeledShapeRequest]:
        # convert the input into a form the model can understand
        transformed_image = [self._transforms(image)]

        # run the ML model
        results = self._model(transformed_image)

        # convert the results into a form CVAT can understand
        return [
            cvataa.rectangle(label.item(), [x.item() for x in box])
            for result in results
            for box, label in zip(result['boxes'], result['labels'])
        ]

# log into the CVAT server
with make_client(host="localhost", credentials=("user", "password")) as client:
    # annotate task 12345 using Faster R-CNN
    cvataa.annotate_task(client, 41617,
        TorchvisionDetectionFunction("fasterrcnn_resnet50_fpn_v2", "DEFAULT", box_score_thresh=0.5),
    )

Auto-annotation interface

Currently, the only type of AA function supported by this layer is the detection function. Therefore, all of the following information will pertain to detection functions.

A detection function accepts an image and returns a list of shapes found in that image. When it is applied to a dataset, the AA function is run for every image, and the resulting lists of shapes are combined and uploaded to CVAT.

A detection function must have two attributes, spec and detect.

spec must contain the AA function’s specification, which is an instance of DetectionFunctionSpec.

DetectionFunctionSpec must be initialized with a sequence of PatchedLabelRequest objects that represent the labels that the AA function knows about. See the docstring of DetectionFunctionSpec for more information on the constraints that these objects must follow.

detect must be a function/method accepting two parameters:

  • context (DetectionFunctionContext). Contains information about the current image. Currently DetectionFunctionContext only contains a single field, frame_name, which contains the file name of the frame on the CVAT server.

  • image (PIL.Image.Image). Contains image data.

detect must return a list of LabeledShapeRequest objects, representing shapes found in the image. See the docstring of DetectionFunctionSpec for more information on the constraints that these objects must follow.

The same AA function may be used with any dataset that contain labels with the same name as the AA function’s specification. The way it works is that the driver matches labels between the spec and the dataset, and replaces the label IDs in the shape objects with those defined in the dataset.

For example, suppose the AA function’s spec defines the following labels:

Name ID
bat 0
rat 1

And the dataset defines the following labels:

Name ID
bat 100
cat 101
rat 102

Then suppose detect returns a shape with label_id equal to 1. The driver will see that it refers to the rat label, and replace it with 102, since that’s the ID this label has in the dataset.

The same logic is used for sub-label IDs.

Helper factory functions

The CVAT API model types used in the AA function protocol are somewhat unwieldy to work with, so it’s recommented to use the helper factory functions provided by this layer. These helpers instantiate an object of their corresponding model type, passing their arguments to the model constructor and sometimes setting some attributes to fixed values.

The following helpers are available for building specifications:

Name Model type Fixed attributes
label_spec PatchedLabelRequest -
skeleton_label_spec PatchedLabelRequest type="skeleton"
keypoint_spec SublabelRequest -

The following helpers are available for use in detect:

Name Model type Fixed attributes
shape LabeledShapeRequest frame=0
rectangle LabeledShapeRequest frame=0, type="rectangle"
skeleton LabeledShapeRequest frame=0, type="skeleton"
keypoint SubLabeledShapeRequest frame=0, type="points"

Auto-annotation driver

The annotate_task function uses an AA function to annotate a CVAT task. It must be called as follows:

annotate_task(<client>, <task ID>, <AA function>, <optional arguments...>)

The supplied client will be used to make all API calls.

By default, new annotations will be appended to the old ones. Use clear_existing=True to remove old annotations instead.

If a detection function declares a label that has no matching label in the task, then by default, BadFunctionError is raised, and auto-annotation is aborted. If you use allow_unmatched_label=True, then such labels will be ignored, and any shapes referring to them will be dropped. Same logic applies to sub-label IDs.

annotate_task will raise a BadFunctionError exception if it detects that the function violated the AA function protocol.

Predefined AA functions

This layer includes several predefined AA functions. You can use them as-is, or as a base on which to build your own.

Each function is implemented as a module to allow usage via the CLI auto-annotate command. Therefore, in order to use it from the SDK, you’ll need to import the corresponding module.

cvat_sdk.auto_annotation.functions.torchvision_detection

This AA function uses object detection models from the torchvision library. It produces rectangle annotations.

To use it, install CVAT SDK with the pytorch extra:

$ pip install "cvat-sdk[pytorch]"

Usage from Python:

from cvat_sdk.auto_annotation.functions.torchvision_detection import create as create_torchvision
annotate_task(<client>, <task ID>, create_torchvision(<model name>, ...))

Usage from the CLI:

cvat-cli auto-annotate "<task ID>" --function-module cvat_sdk.auto_annotation.functions.torchvision_detection \
      -p model_name=str:"<model name>" ...

The create function accepts the following parameters:

  • model_name (str) - the name of the model, such as fasterrcnn_resnet50_fpn_v2. This parameter is required.
  • weights_name (str) - the name of a weights enum value for the model, such as COCO_V1. Defaults to DEFAULT.

It also accepts arbitrary additional parameters, which are passed directly to the model constructor.

cvat_sdk.auto_annotation.functions.torchvision_keypoint_detection

This AA function is analogous to torchvision_detection, except it uses torchvision’s keypoint detection models and produces skeleton annotations. Keypoints which the model marks as invisible will be marked as occluded in CVAT.

Refer to the previous section for usage instructions and parameter information.

6 - Developer guide

Overview

This package contains manually written and autogenerated files. We store only sources in the repository. To get the full package, one need to generate missing package files.

Package file layout

  • gen/ - generator files
  • cvat_sdk/ - Python package root
  • cvat_sdk/api_client - autogenerated low-level package code
  • cvat_sdk/core - high-level package code

How to generate package code

  1. Install generator dependencies:

    pip install -r gen/requirements.txt
    
  2. Generate package code (call from the package root directory!):

    ./gen/generate.sh
    
  3. Install the packages:

    pip install cvat-sdk/
    pip install cvat-cli/
    

    If you want to edit package files, install them with -e:

    pip install -e cvat-sdk/
    pip install -e cvat-cli/
    

How to edit templates

If you want to edit templates, obtain them from the generator first:

docker run --rm -v $PWD:/local \
    openapitools/openapi-generator-cli author template \
        -o /local/generator_templates -g python

Then, you can copy the modified version of the template you need into the gen/templates/openapi-generator/ directory.

Relevant links:

How to test

API client tests are integrated into REST API tests in /tests/python/rest_api and SDK tests are placed next to them in /tests/python/sdk. To execute, run:

pytest tests/python/rest_api tests/python/sdk

SDK API design decisions

The generated ApiClient code is modified from what openapi-generator does by default. Changes are mostly focused on better user experience - including better usage patterns and simpler/faster ways to achieve results.

Modifications

  • Added Python type annotations for return types and class members. This change required us to implement a custom post-processing script, which converts generated types into correct type annotations. The types generated by default are supposed to work with the API implementation (parameter validation and parsing), but they are not applicable as type annotations (they have incorrect syntax). Custom post-processing allowed us to make these types correct type annotations. Other possible solutions:

    • There is the python-experimental API generator, which may solve some issues, but it is unstable and requires python 3.9. Our API works with 3.7, which is the lowest supported version now.
    • Custom templates - partially works, but only in limited cases (model fields). It’s very hard to maintain the template code and logic for this. Only if checks and for loops are available in mustache templates, which is not enough for annotation generation.
  • Separate APIs are embedded into the general APIClient class. Now we have:

    with ApiClient(config) as api_client:
      result1 = api_client.foo_api.operation1()
      result2 = api_client.bar_api.operation2()
    

    This showed to be more convenient than the default:

    with ApiClient(config) as api_client:
      foo_api = FooApi(api_client)
      result1 = foo_api.operation1()
      result2 = foo_api.operation2()
    
      bar_api = BarApi(api_client)
      result3 = bar_api.operation3()
      result4 = bar_api.operation4()
    

    This also required custom post-processing. Operation Ids are supposed to be unique in the OpenAPI / Swagger specification. Therefore, we can’t generate such schema on the server, nor we can’t expect it to be supported in the API generator.

  • Operations have IDs like <api>/<method>_<object>. This also showed to be more readable and more natural than DRF-spectacular’s default <api>/<object>_<method>.

  • Server operations have different types for input and output values. While it can be expected that an endopint with POST/PUT methods available (like create or partial_update) has the same type for input and output (because it looks natural), it also leads to the situation, in which there are lots of read-/write-only fields, and it becomes hard for understanding. This clear type separation is supposed to make it simpler for users.

  • Added cookie management in the ApiClient class.

  • Added interface classes for models to simplify class member usage and lookup.

  • Dicts can be passed into API methods and model constructors instead of models. They are automatically parsed as models. In the original implementation, the user is required to pass a Configuration object each time, which is clumsy and adds little sense.