2 - Low-level API
Overview
The low-level API is useful if you need to work directly with REST API, but want
to have data validation and syntax assistance from your code editor. The code
on this layer is autogenerated.
Code of this component is located in cvat_sdk.api_client
.
Example
Let’s see how a task with local files can be created. We will use the basic auth
to make things simpler.
from time import sleep
from cvat_sdk.api_client import Configuration, ApiClient, models, apis, exceptions
configuration = Configuration(
host="http://localhost",
username='YOUR_USERNAME',
password='YOUR_PASSWORD',
)
# Enter a context with an instance of the API client
with ApiClient(configuration) as api_client:
# Parameters can be passed as a plain dict with JSON-serialized data
# or as model objects (from cvat_sdk.api_client.models), including
# mixed variants.
#
# In case of dicts, keys must be the same as members of models.I<ModelName>
# interfaces and values must be convertible to the corresponding member
# value types (e.g. a date or string enum value can be parsed from a string).
#
# In case of model objects, data must be of the corresponding
# models.<ModelName> types.
#
# Let's use a dict here. It should look like models.ITaskWriteRequest
task_spec = {
'name': 'example task',
"labels": [{
"name": "car",
"color": "#ff00ff",
"attributes": [
{
"name": "a",
"mutable": True,
"input_type": "number",
"default_value": "5",
"values": ["4", "5", "6"]
}
]
}],
}
try:
# Apis can be accessed as ApiClient class members
# We use different models for input and output data. For input data,
# models are typically called like "*Request". Output data models have
# no suffix.
(task, response) = api_client.tasks_api.create(task_spec)
except exceptions.ApiException as e:
# We can catch the basic exception type, or a derived type
print("Exception when trying to create a task: %s\n" % e)
# Here we will use models instead of a dict
task_data = models.DataRequest(
image_quality=75,
client_files=[
open('image1.jpg', 'rb'),
open('image2.jpg', 'rb'),
],
)
# If we pass binary file objects, we need to specify content type.
# For this endpoint, we don't have response data
(_, response) = api_client.tasks_api.create_data(task.id,
data_request=task_data,
_content_type="multipart/form-data",
# we can choose to check the response status manually
# and disable the response data parsing
_check_status=False, _parse_response=False
)
assert response.status == 202, response.msg
# Wait till task data is processed
for _ in range(100):
(status, _) = api_client.tasks_api.retrieve_status(task.id)
if status.state.value in ['Finished', 'Failed']:
break
sleep(0.1)
assert status.state.value == 'Finished', status.message
# Update the task object and check the task size
(task, _) = api_client.tasks_api.retrieve(task.id)
assert task.size == 4
ApiClient and configuration
The starting point in the low-level API is the cvat_sdk.api_client.ApiClient
class.
It encapsulates session and connection logic, manages headers and cookies,
and provides access to various APIs.
To create an instance of ApiClient
, you need to set up a cvat_sdk.api_client.Configuration
object and pass it to the ApiClient
class constructor. Additional connection-specific
options, such as extra headers and cookies can be specified in the class constructor.
ApiClient
implements the context manager protocol. Typically, you create ApiClient
this way:
from cvat_sdk.api_client import ApiClient, Configuration
configuration = Configuration(host="http://localhost")
with ApiClient(configuration) as api_client:
...
After creating an ApiClient
instance, you can send requests to various server endpoints
via *_api
member properties and directly, using the rest_client
member.
Read more about API wrappers below.
Typically, the first thing you do with ApiClient
is log in.
Read more about authentication options below.
Authentication
CVAT supports 2 authentication options:
- basic auth, with your username and password
- token auth, with your API key
Token auth requires a token, which can be obtained after performing the basic auth.
The low-level API supports 2 ways of authentication.
You can specify authentication parameters in the Configuration
object:
configuration = Configuration(
username='YOUR_USERNAME',
password='YOUR_PASSWORD',
)
configuration = Configuration(
api_key={
"sessionAuth": "<sessionid cookie value>",
"csrfAuth": "<csrftoken cookie value>",
"tokenAuth": "Token <auth key value>",
}
)
You can perform a regular login using the auth_api
member of ApiClient
and
set the Authorization
header using the Token
prefix. This way, you’ll be able to
obtain API tokens, which can be reused in the future to avoid typing your credentials.
from cvat_sdk.api_client import models
(auth, _) = api_client.auth_api.create_login(
models.LoginRequest(username=credentials[0], password=credentials[1])
)
assert "sessionid" in api_client.cookies
assert "csrftoken" in api_client.cookies
api_client.set_default_header("Authorization", "Token " + auth.key)
API wrappers
API endpoints are grouped by tags into separate classes in the cvat_sdk.api_client.apis
package.
APIs can be accessed as ApiClient
object members:
api_client.auth_api.<operation>(...)
api_client.tasks_api.<operation>(...)
And APIs can be instantiated directly like this:
from cvat_sdk.api_client import ApiClient, apis
api_client = ApiClient(...)
auth_api = apis.AuthApi(api_client)
auth_api.<operation>(...)
tasks_api = apis.TasksApi(api_client)
tasks_api.<operation>(...)
For each operation, the API wrapper class has a corresponding <operation>_endpoint
member.
This member represents the endpoint as a first-class object, which provides metainformation
about the endpoint, such as the relative URL of the endpoint, parameter names,
types and their placement in the request. It also allows to pass the operation to other
functions and invoke it from there.
For a typical server entity like Task
, Project
, Job
etc., the *Api
classes provide methods
that reflect Create-Read-Update-Delete (CRUD) operations: create
, retrieve
, list
, update
,
partial_update
, delete
. The set of available operations depends on the entity type.
You can find the list of the available APIs and their documentation here.
Models
Requests and responses can include data. It can be represented as plain Python
data structures and model classes (or models). In CVAT API, model for requests and responses
are separated: the request models have the Request
suffix in the name, while the response
models have no suffix. Models can be found in the cvat_sdk.api_client.models
package.
Models can be instantiated like this:
from cvat_sdk.api_client import models
user_model = models.User(...)
Model parameters can be passed as models, or as plain Python data structures. This rule applies
recursively, starting from the method parameters. In particular, this means you can pass
a dict into a method or into a model constructor, and corresponding fields will
be parsed from this data automatically:
task_spec = models.TaskWriteRequest(
name='example task',
labels=[
models.PatchedLabelRequest(
name="car",
color="#ff00ff",
attributes=[
model.AttributeRequest(
name="a",
mutable=True,
input_type="number",
default_value="5",
values=["4", "5", "6"]
)
]
)
],
)
api_client.tasks_api.create(task_spec)
Is equivalent to:
api_client.tasks_api.create({
'name': 'example task',
"labels": [{
"name": "car",
"color": "#ff00ff",
"attributes": [
{
"name": "a",
"mutable": True,
"input_type": "number",
"default_value": "5",
"values": ["4", "5", "6"]
}
]
}],
})
You can mix these variants.
Most models provide corresponding interface classes called like I<model name>
. They can be
used to implement your own classes or describe APIs. They just provide type annotations
and descriptions for model fields.
You can export model values to plain Python dicts using the as_dict()
method and
the cvat_sdk.api_client.model_utils.to_json()
function.
You can find the list of the available models and their documentation here.
Sending requests
To send a request to a server endpoint, you need to obtain an instance of the corresponding *Api
class. You can find summary about available API classes and supported endpoints
here. The *Api
instance object allows to send requests to the relevant
server endpoints.
By default, all operations return 2 objects: the parsed response data and the response itself.
The first returned value is a model parsed from the response data. If a method does
not have any return value, None
is always returned as the first value. You can control
automatic parsing using the _parse_response
method kwarg. When disabled, None
is returned.
The second value is the raw response, which can be useful to get response parameters, such as
status code, headers, or raw response data. By default, the status code of the response is
checked to be positive. In the case of request failure, an exception is raised by default.
This behavior can be controlled by the _check_status
method kwarg. If the status is not
checked, you will need to manually check the response status code and perform actions needed.
A typical endpoint call looks like this:
from cvat_sdk.api_client import ApiClient, apis
with ApiClient(...) as api_client:
...
(data, response) = api_client.tasks_api.list()
# process the response ...
Operation parameters can be passed as positional or keyword arguments. API methods provide
extra common arguments which control invocation logic:
_parse_response
(bool
) - Allows to enable and disable response data parsing. When enabled,
the response data is parsed into a model or a basic type and returned as the first value.
When disabled, the response is not parsed, and None
is returned. Can be useful,
for instance, if you need to parse data manually, or if you expect an error in the response.
Default is True
.
_check_status
(bool
) - Allows to enable or disable response status checks. When enabled, the
response status code is checked to be positive as defined in the HTTP standards.
In the case of negative status, an exception is raised. Default is True
.
_validate_inputs
(bool
): specifies if type checking should be done on the data
sent to the server. Default is True
.
_validate_outputs
(bool
): specifies if type checking should be done on the data
received from the server. Default is True
.
_request_timeout
(None | int | float | Tuple[int | float, int | float]
) -
Allows to control timeouts. If one number is provided, it will be the total request timeout. It can also
be a tuple with (connection, read) timeouts. Default is None
, which means no timeout.
_content_type
(None | str
) - Allows to specify the Content-Type
header value
for the request. Endpoints can support different content types and behave differently
depending on the value. For file uploads _content_type="multipart/form-data"
must be specified.
Read more about file uploads here. Default is application/json
.
NOTE: the API is autogenerated. In some cases the server API schema may be incomplete
or underspecified. Please report to us all the problems found. A typical problem is that a
response data can’t be parsed automatically due to the incorrect schema. In this case, the
simplest workaround is to disable response parsing using the _parse_response=False
method argument.
You can find many examples of API client usage in REST API tests here.
Organizations
To create resource in the context of an organization, use one of these method arguments:
org
- The unique organization slug
org_id
- The organization id
...
(task, response) = api_client.tasks_api.create(task_spec, org_id=org_id)
Paginated responses
There are several endpoints that allow to request multiple server entities. Typically, these
endpoints are called list_...
. When there are lots of data, the responses can be paginated to
reduce server load. If an endpoint returns paginated data, a single page is returned per request.
In some cases all entries need to be retrieved. CVAT doesn’t provide specific API or parameters
for this, so the solution is to write a loop to collect and join data from multiple requests.
SDK provides an utility function for this at cvat_sdk.core.helpers.get_paginated_collection()
.
Example:
from cvat_sdk.core.helpers import get_paginated_collection
...
project_tasks = get_paginated_collection(
api_client.projects_api.list_tasks_endpoint,
id=project_id,
)
Binary data in requests and responses
At the moment, sending and receiving binary data - such as files - can be difficult via the
low-level SDK API. Please use the following recommendations.
Sending data
By default, requests use the application/json
content type, which is a text type.
However, it’s inefficient to send binary data in this encoding, and the data passed
won’t be converted automatically. If you need to send files or other binary data,
please specify _content_type="multipart/form-data"
in the request parameters:
Example:
(_, response) = api_client.tasks_api.create_data(
id=42,
data_request=models.DataRequest(
client_files=[
open("image.jpg", 'rb')
],
image_quality=70,
),
_content_type="multipart/form-data", # required
)
Please also note that if there are complex fields in the data (such as nested lists or dicts),
they, in turn, cannot be encoded as multipart/form-data
, so the recommended solution is to
split fields into files and others, and send them in different requests with different content
types:
Example:
data = {
'client_files': [...], # a list of binary files
'image_quality': ..., # a simple type - int
'job_file_mapping': [...], # a complex type - list
}
# Initialize uploading
api_client.tasks_api.create_data(
id=42,
data_request=models.DataRequest(image_quality=data["image_quality"]),
upload_start=True,
)
# Upload binary data
api_client.tasks_api.create_data(
id=42,
data_request=models.DataRequest(
client_files=data.pop("client_files"),
image_quality=data["image_quality"],
),
upload_multiple=True,
_content_type="multipart/form-data",
)
# Finalize the uploading and send the remaining fields
api_client.tasks_api.create_data(
id=42,
data_request=models.DataRequest(**data),
upload_finish=True,
)
Receiving data
Receiving binary files can also be difficult with the low-level API. To avoid unexpected
behavior, it is recommended to specify _parse_response=False
in the request parameters.
In this case, SDK will not try to parse models from responses, and the response data
can be fetched directly from the response:
from time import sleep
# Export a task as a dataset
while True:
(_, response) = api_client.tasks_api.retrieve_dataset(
id=42,
format='COCO 1.0',
_parse_response=False,
)
if response.status == HTTPStatus.CREATED:
break
sleep(interval)
(_, response) = api_client.tasks_api.retrieve_dataset(
id=42,
format='COCO 1.0',
action="download",
_parse_response=False,
)
# Save the resulting file
with open('output_file', 'wb') as output_file:
output_file.write(response.data)
Different versions of API endpoints
The cloudstorages/id/content REST API endpoint
Warning: The retrieve_content
method of cloudstorages_api
will be deprecated in 2.5.0 version.
We recommend using retrieve_content_v2
method that matches to revised API when using SDK.
For backward compatibility, we continue to support the prior interface version until version 2.6.0 is released.
Here you can find the example how to get the bucket content using new method retrieve_content_v2
.
from pprint import pprint
from cvat_sdk.api_client import ApiClient, Configuration
next_token = None
files, prefixes = [], []
prefix = ""
with ApiClient(
configuration=Configuration(host=BASE_URL, username=user, password=password)
) as api_client:
while True:
data, response = api_client.cloudstorages_api.retrieve_content_v2(
cloud_storage_id,
**({"prefix": prefix} if prefix else {}),
**({"next_token": next_token} if next_token else {}),
)
# the data will have the following structure:
# {'content': [
# {'mime_type': <image|video|archive|pdf|DIR>, 'name': <name>, 'type': <REG|DIR>},
# ],
# 'next': <next_token_string|None>}
files.extend(
[
prefix + f["name"]
for f in data["content"]
if str(f["type"]) == "REG"
]
)
prefixes.extend(
[
prefix + f["name"]
for f in data["content"]
if str(f["type"]) == "DIR"
]
)
next_token = data["next"]
if next_token:
continue
if not len(prefixes):
break
prefix = f"{prefixes.pop()}/"
pprint(files) # ['sub/image_1.jpg', 'image_2.jpg']
3 - High-level API
Overview
This layer provides high-level APIs, allowing easier access to server operations.
API includes Repositories and Entities. Repositories provide management
operations for Entities. Entities represent objects on the server
(e.g. projects, tasks, jobs etc) and simplify interaction with them. The key difference
from the low-level API is that operations on this layer are not limited by a single
server request per operation and encapsulate low-level request machinery behind a high-level
object-oriented API.
The code of this component is located in the cvat_sdk.core
package.
Example
from cvat_sdk import make_client, models
from cvat_sdk.core.proxies.tasks import ResourceType, Task
# Create a Client instance bound to a local server and authenticate using basic auth
with make_client(host="localhost", credentials=('user', 'password')) as client:
# Let's create a new task.
# Fill in task parameters first.
# Models are used the same way as in the layer 1.
task_spec = {
"name": "example task",
"labels": [
{
"name": "car",
"color": "#ff00ff",
"attributes": [
{
"name": "a",
"mutable": True,
"input_type": "number",
"default_value": "5",
"values": ["4", "5", "6"],
}
],
}
],
}
# Now we can create a task using a task repository method.
# Repositories can be accessed as the Client class members.
# In this case we use 2 local images as the task data.
task = client.tasks.create_from_data(
spec=task_spec,
resource_type=ResourceType.LOCAL,
resources=['image1.jpg', 'image2.png'],
)
# The returned task object is already up-to-date with its server counterpart.
# Now we can access task fields. The fields are read-only and can be optional.
# Let's check that we have 2 images in the task data.
assert task.size == 2
# If an object is modified on the server, the local object is not updated automatically.
# To reflect the latest changes, the local object needs to be fetch()-ed.
task.fetch()
# Let's obtain another task. Again, it can be done via the task repository.
# Suppose we have already created the task earlier and know the task id.
task2 = client.tasks.retrieve(42)
# The task object fields can be update()-d. Note that the set of fields that can be
# modified can be different from what is available for reading.
task2.update({'name': 'my task'})
# And the task can also be remove()-d from the server. The local copy will remain
# untouched.
task2.remove()
Client
The cvat_sdk.core.client.Client
class provides session management, implements
authentication operations and simplifies access to server APIs.
It is the starting point for using CVAT SDK.
A Client
instance allows you to:
- configure connection options with the
Config
class
- check server API compatibility with the current SDK version
- deduce server connection scheme (
https
or http
) automatically
- manage user session with the
login()
, logout()
and other methods
- obtain Repository objects with the
users
, tasks
, jobs
and other members
- reach to lower-level APIs with the corresponding members
An instance of Client
can be created directly by calling the class constructor
or with the utility function cvat_sdk.core.client.make_client()
which can handle
some configuration for you. A Client
can be configured with
the cvat_sdk.core.client.Config
class instance. A Config
object can be passed to
the Client
constructor and then it will be available in the Client.config
field.
The Client
class implements the context manager protocol.
When the context is closed, the session is finished, and the user is logged out
automatically. Otherwise, these actions can be done with the close()
and logout()
methods.
You can create and start using a Client
instance this way:
from cvat_sdk import make_client
with make_client('localhost', port='8080', credentials=('user', 'password')) as client:
...
The make_client()
function handles configuration and object creation for you.
It also allows to authenticate right after the object is created.
If you need to configure Client
parameters, you can do this:
from cvat_sdk import Config, Client
config = Config()
# set up some config fields ...
with Client('localhost:8080', config=config) as client:
client.login(('user', 'password'))
...
You can specify server address both with and without the scheme. If the scheme is omitted,
it will be deduced automatically.
The checks are performed in the following
order: https
(with the default port 8080), http
(with the default port 80).
In some cases it may lead to incorrect results - e.g. you have 2 servers running on the
same host at default ports. In such cases just specify the schema manually: https://localhost
.
When the server is located, its version is checked. If an unsupported version is found,
an error can be raised or suppressed (controlled by config.allow_unsupported_server
).
If the error is suppressed, some SDK functions may not work as expected with this server.
By default, a warning is raised and the error is suppressed.
Users and organizations
All Client
operations rely on the server API and depend on the current user
rights. This affects the set of available APIs, objects and actions. For example, a regular user
can only see and modify their tasks and jobs, while an admin user can see all the tasks etc.
Operations are also affected by the current organization context,
which can be set with the organization_slug
property of Client
instances.
The organization context affects which entities are visible,
and where new entities are created.
Set organization_slug
to an organization’s slug (short name)
to make subsequent operations work in the context of that organization:
client.organization_slug = 'myorg'
# create a task in the organization
task = client.tasks.create_from_data(...)
You can also set organization_slug
to an empty string
to work in the context of the user’s personal workspace.
By default, it is set to None
,
which means that both personal and organizational entities are visible,
while new entities are created in the personal workspace.
To temporarily set the organization slug, use the organization_context
function:
with client.organization_context('myorg'):
task = client.tasks.create_from_data(...)
# the slug is now reset to its previous value
Entities and Repositories
Entities represent objects on the server. They provide read access to object fields
and implement additional relevant operations, including both the general Read-Update-Delete and
object-specific ones. The set of available general operations depends on the object type.
Repositories provide management operations for corresponding Entities. You don’t
need to create Repository objects manually. To obtain a Repository object, use the
corresponding Client
instance member:
client.projects
client.tasks
client.jobs
client.users
...
An Entity can be created on the server with the corresponding Repository method create()
:
task = client.tasks.create(<task config>)
We can retrieve server objects using the retrieve()
and list()
methods of the Repository:
job = client.jobs.retrieve(<job id>)
tasks = client.tasks.list()
After calling these functions, we obtain local objects representing their server counterparts.
Object fields can be updated with the update()
method. Note that the set of fields that can be
modified can be different from what is available for reading.
job.update({'stage': 'validation'})
The server object will be updated and the local object will reflect the latest object state
after calling this operation.
Note that local objects may fall out of sync with their server counterparts for different reasons.
If you need to update the local object with the latest server state, use the fetch()
method:
# obtain 2 local copies of the same job
job_ref1 = client.jobs.retrieve(1)
job_ref2 = client.jobs.retrieve(1)
# update the server object with the first reference
job_ref1.update(...)
# job_ref2 is outdated now
job_ref2.fetch()
# job_ref2 is synced
Finally, if you need to remove the object from the server, you can use the remove()
method.
The server object will be removed, but the local copy of the object will remain untouched.
task = client.tasks.retrieve(<task id>)
task.remove()
Repositories can also provide group operations over entities. For instance, you can retrieve
all available objects using the list()
Repository method. The list of available
Entity and Repository operations depends on the object type.
You can learn more about entity members and how model parameters are passed to functions here.
The implementation for these components is located in cvat_sdk.core.proxies
.
4 - PyTorch adapter
Overview
This layer provides functionality
that enables you to treat CVAT projects and tasks as PyTorch datasets.
The code of this layer is located in the cvat_sdk.pytorch
package.
To use it, you must install the cvat_sdk
distribution with the pytorch
extra.
Example
import torch
import torchvision.models
from cvat_sdk import make_client
from cvat_sdk.pytorch import ProjectVisionDataset, ExtractSingleLabelIndex
# create a PyTorch model
model = torchvision.models.resnet34(
weights=torchvision.models.ResNet34_Weights.IMAGENET1K_V1)
model.eval()
# log into the CVAT server
with make_client(host="localhost", credentials=('user', 'password')) as client:
# get the dataset comprising all tasks for the Validation subset of project 12345
dataset = ProjectVisionDataset(client, project_id=12345,
include_subsets=['Validation'],
# use transforms that fit our neural network
transform=torchvision.models.ResNet34_Weights.IMAGENET1K_V1.transforms(),
target_transform=ExtractSingleLabelIndex())
# print the number of images in the dataset (in other words, the number of frames
# in the included tasks)
print(len(dataset))
# get a sample from the dataset
image, target = dataset[0]
# evaluate the network on the sample and compare the output to the target
output = model(image)
if torch.equal(output, target):
print("correct prediction")
else:
print("incorrect prediction")
Datasets
The key components of this layer are the dataset classes,
ProjectVisionDataset
and TaskVisionDataset
,
representing data & annotations contained in a CVAT project or task, respectively.
Both of them are subclasses of the torch.utils.data.Dataset
abstract class.
The interface of Dataset
is essentially that of a sequence
whose elements are samples from the dataset.
In the case of TaskVisionDataset
, each sample represents a frame from the task
and its associated annotations.
The order of the samples is the same as the order of frames in the task.
Deleted frames are omitted.
In the case of ProjectVisionDataset
,
each sample is a sample from one of the project’s tasks,
as if obtained from a TaskVisionDataset
instance created for that task.
The full sequence of samples is built by concatenating the sequences of samples
from all included tasks in an unspecified order
that is guaranteed to be consistent between executions.
For details on what tasks are included, see Task filtering.
Construction
Both dataset classes are instantiated by passing in an instance of cvat_sdk.Client
and the ID of the project or task:
dataset = ProjectVisionDataset(client, 123)
dataset = TaskVisionDataset(client, 456)
The referenced project or task must contain image data.
Video data is currently not supported.
The constructors of these classes also support several keyword-only parameters:
During construction,
the dataset objects either populate or validate the local data cache
(see Caching for details).
Any necessary requests to the CVAT server are performed at this time.
After construction, the objects make no more network requests.
Indexing a dataset produces a sample.
A sample has the form of a tuple with the following components:
sample[0]
(PIL.Image.Image
): the image.
sample[1]
(cvat_sdk.pytorch.Target
): the annotations and auxiliary data.
The target object contains the following attributes:
target.annotations.tags
(list[cvat_sdk.models.LabeledImage]
):
tag annotations associated with the current frame.
target.annotations.shapes
(list[cvat_sdk.models.LabeledShape]
):
shape annotations associated with the current frame.
target.label_id_to_index
(Mapping[int, int]
):
see Label index assignment.
Note that track annotations are currently inaccessible.
The dataset classes support torchvision-like transforms
that you can supply to preprocess each sample before it’s returned.
You can use this to convert the samples to a more convenient format
or to preprocess the data.
The transforms are supplied via the following constructor parameters:
transforms
: a callable that accepts two arguments (the image and the target)
and returns a tuple with two elements.
transform
: a callable that accepts an image.
target_transform
: a callable that accepts a target.
Let the sample value prior to any transformations be (image, target)
.
Here is what indexing the dataset will return for various combinations of
supplied transforms:
transforms
: transforms(image, target)
.
transform
: (transform(image), target)
.
target_transform
: (image, target_transform(target))
.
transform
and target_transform
:
(transform(image), target_transform(target))
.
transforms
cannot be supplied at the same time
as either transform
or target_transform
.
The cvat_sdk.pytorch
module contains some target transform classes
that are intended for common use cases.
See Transforms.
Label index assignment
The annotation model classes (LabeledImage
and LabeledShape
)
reference labels by their IDs on the CVAT server.
This is usually not very useful for machine learning code,
since those IDs are unpredictable and will be different between different projects,
even if semantically the set of labels is the same.
Therefore, the dataset classes assign to each label a unique index that
is intended to be a project-independent identifier.
These indices are accessible via the label_id_to_index
attribute
on each sample’s target.
This attribute maps IDs on the server to the assigned index.
The mapping is the same for every sample.
By default, the dataset classes arrange all label IDs in an unspecified order
that remains consistent across executions,
and assign them sequential indices, starting with 0.
You can override this behavior and specify your own label indices
with the label_name_to_index
constructor parameter.
This parameter accepts a mapping from label name to index.
The mapping must contain a key for each label in the project/task.
When this parameter is specified, label indices are assigned
by looking up each label’s name in the provided mapping and using the result.
Task filtering
Note: this section applies only to ProjectVisionDataset
.
By default, a ProjectVisionDataset
includes samples
from every task belonging to the project.
You can change this using the following constructor parameters:
-
task_filter
(Callable[[models.ITaskRead], bool]
):
if set, the callable will be called for every task,
with an instance of ITaskRead
corresponding to that task
passed as the argument.
Only tasks for which True
is returned will be included.
-
include_subsets
(Container[str]
):
if set, only tasks whose subset is a member of the container
will be included.
Both parameters can be set,
in which case tasks must fulfull both criteria to be included.
Caching
The images and annotations of a dataset can be substantial in size,
so they are not downloaded from the server every time a dataset object is created.
Instead, they are loaded from a cache on the local file system,
which is maintained during dataset object construction
according to the policy set by the update_policy
constructor parameter.
The available policies are:
-
UpdatePolicy.IF_MISSING_OR_STALE
:
If some data is already cached,
query the server to determine if it is out of date.
If so, discard it.
Then, download all necessary data that is missing from the cache and cache it.
This is the default policy.
-
UpdatePolicy.NEVER
:
If some necessary data is missing from the cache,
raise an exception.
Don’t make any network requests.
Note that this policy permits the use of stale data.
By default, the cache is located in a platform-specific per-user directory.
You can change this location with the cache_dir
setting in the Client
configuration.
The layer provides some classes whose instances are callables
suitable for usage with the target_transform
dataset constructor parameter
that are intended to simplify working with CVAT datasets in common scenarios.
Intended for object detection tasks.
Constructor parameters:
Effect: Gathers all shape annotations from the input target object
whose types are contained in the value of include_shape_types
.
Then returns a dictionary with the following string keys
(where N
is the number of gathered shapes):
-
"boxes"
(a floating-point tensor of shape N
x4
).
Each row represents the bounding box the corresponding shape
in the following format: [x_min, y_min, x_max, y_max]
.
-
"labels"
(an integer tensor of shape N
).
Each element is the index of the label of the corresponding shape.
Example:
ExtractBoundingBoxes(include_shape_types=['rectangle', 'ellipse'])
Intended for image classification tasks.
Constructor parameters: None.
Effect: If the input target object contains no tag annotations
or more than one tag annotation, raises ValueError
.
Otherwise, returns the index of the label in the solitary tag annotation
as a zero-dimensional tensor.
Example:
ExtractSingleLabelIndex()
5 - Auto-annotation API
Overview
This layer provides functionality that allows you to automatically annotate a CVAT dataset
by running a custom function on your local machine.
A function, in this context, is a Python object that implements a particular protocol
defined by this layer.
To avoid confusion with Python functions,
auto-annotation functions will be referred to as “AA functions” in the following text.
A typical AA function will be based on a machine learning model
and consist of the following basic elements:
-
Code to load the ML model.
-
A specification describing the annotations that the AA function can produce.
-
Code to convert data from CVAT to a format the ML model can understand.
-
Code to run the ML model.
-
Code to convert resulting annotations to a format CVAT can understand.
The layer can be divided into several parts:
-
The interface, containing the protocol that an AA function must implement.
-
The driver, containing functionality to annotate a CVAT dataset using an AA function.
-
The predefined AA function based on Ultralytics YOLOv8n.
The auto-annotate
CLI command provides a way to use an AA function from the command line
rather than from a Python program.
See the CLI documentation for details.
Example
from typing import List
import PIL.Image
import torchvision.models
from cvat_sdk import make_client
import cvat_sdk.models as models
import cvat_sdk.auto_annotation as cvataa
class TorchvisionDetectionFunction:
def __init__(self, model_name: str, weights_name: str, **kwargs) -> None:
# load the ML model
weights_enum = torchvision.models.get_model_weights(model_name)
self._weights = weights_enum[weights_name]
self._transforms = self._weights.transforms()
self._model = torchvision.models.get_model(model_name, weights=self._weights, **kwargs)
self._model.eval()
@property
def spec(self) -> cvataa.DetectionFunctionSpec:
# describe the annotations
return cvataa.DetectionFunctionSpec(
labels=[
cvataa.label_spec(cat, i)
for i, cat in enumerate(self._weights.meta['categories'])
]
)
def detect(self, context, image: PIL.Image.Image) -> List[models.LabeledShapeRequest]:
# convert the input into a form the model can understand
transformed_image = [self._transforms(image)]
# run the ML model
results = self._model(transformed_image)
# convert the results into a form CVAT can understand
return [
cvataa.rectangle(label.item(), [x.item() for x in box])
for result in results
for box, label in zip(result['boxes'], result['labels'])
]
# log into the CVAT server
with make_client(host="localhost", credentials=("user", "password")) as client:
# annotate task 12345 using Faster R-CNN
cvataa.annotate_task(client, 41617,
TorchvisionDetectionFunction("fasterrcnn_resnet50_fpn_v2", "DEFAULT", box_score_thresh=0.5),
)
Auto-annotation interface
Currently, the only type of AA function supported by this layer is the detection function.
Therefore, all of the following information will pertain to detection functions.
A detection function accepts an image and returns a list of shapes found in that image.
When it is applied to a dataset, the AA function is run for every image,
and the resulting lists of shapes are combined and uploaded to CVAT.
A detection function must have two attributes, spec
and detect
.
spec
must contain the AA function’s specification,
which is an instance of DetectionFunctionSpec
.
DetectionFunctionSpec
must be initialized with a sequence of PatchedLabelRequest
objects
that represent the labels that the AA function knows about.
See the docstring of DetectionFunctionSpec
for more information on the constraints
that these objects must follow.
detect
must be a function/method accepting two parameters:
-
context
(DetectionFunctionContext
).
Contains information about the current image.
Currently DetectionFunctionContext
only contains a single field, frame_name
,
which contains the file name of the frame on the CVAT server.
-
image
(PIL.Image.Image
).
Contains image data.
detect
must return a list of LabeledShapeRequest
objects,
representing shapes found in the image.
See the docstring of DetectionFunctionSpec
for more information on the constraints
that these objects must follow.
The same AA function may be used with any dataset that contain labels with the same name
as the AA function’s specification.
The way it works is that the driver matches labels between the spec and the dataset,
and replaces the label IDs in the shape objects with those defined in the dataset.
For example, suppose the AA function’s spec defines the following labels:
And the dataset defines the following labels:
Name |
ID |
bat |
100 |
cat |
101 |
rat |
102 |
Then suppose detect
returns a shape with label_id
equal to 1.
The driver will see that it refers to the rat
label, and replace it with 102,
since that’s the ID this label has in the dataset.
The same logic is used for sub-label IDs.
Helper factory functions
The CVAT API model types used in the AA function protocol are somewhat unwieldy to work with,
so it’s recommented to use the helper factory functions provided by this layer.
These helpers instantiate an object of their corresponding model type,
passing their arguments to the model constructor
and sometimes setting some attributes to fixed values.
The following helpers are available for building specifications:
Name |
Model type |
Fixed attributes |
label_spec |
PatchedLabelRequest |
- |
skeleton_label_spec |
PatchedLabelRequest |
type="skeleton" |
keypoint_spec |
SublabelRequest |
- |
The following helpers are available for use in detect
:
Name |
Model type |
Fixed attributes |
shape |
LabeledShapeRequest |
frame=0 |
rectangle |
LabeledShapeRequest |
frame=0 , type="rectangle" |
skeleton |
LabeledShapeRequest |
frame=0 , type="skeleton" |
keypoint |
SubLabeledShapeRequest |
frame=0 , type="points" |
Auto-annotation driver
The annotate_task
function uses an AA function to annotate a CVAT task.
It must be called as follows:
annotate_task(<client>, <task ID>, <AA function>, <optional arguments...>)
The supplied client will be used to make all API calls.
By default, new annotations will be appended to the old ones.
Use clear_existing=True
to remove old annotations instead.
If a detection function declares a label that has no matching label in the task,
then by default, BadFunctionError
is raised, and auto-annotation is aborted.
If you use allow_unmatched_label=True
, then such labels will be ignored,
and any shapes referring to them will be dropped.
Same logic applies to sub-label IDs.
annotate_task
will raise a BadFunctionError
exception
if it detects that the function violated the AA function protocol.
Predefined AA functions
This layer includes several predefined AA functions.
You can use them as-is, or as a base on which to build your own.
Each function is implemented as a module
to allow usage via the CLI auto-annotate
command.
Therefore, in order to use it from the SDK,
you’ll need to import the corresponding module.
cvat_sdk.auto_annotation.functions.torchvision_detection
This AA function uses object detection models from
the torchvision library.
It produces rectangle annotations.
To use it, install CVAT SDK with the pytorch
extra:
$ pip install "cvat-sdk[pytorch]"
Usage from Python:
from cvat_sdk.auto_annotation.functions.torchvision_detection import create as create_torchvision
annotate_task(<client>, <task ID>, create_torchvision(<model name>, ...))
Usage from the CLI:
cvat-cli auto-annotate "<task ID>" --function-module cvat_sdk.auto_annotation.functions.torchvision_detection \
-p model_name=str:"<model name>" ...
The create
function accepts the following parameters:
model_name
(str
) - the name of the model, such as fasterrcnn_resnet50_fpn_v2
.
This parameter is required.
weights_name
(str
) - the name of a weights enum value for the model, such as COCO_V1
.
Defaults to DEFAULT
.
It also accepts arbitrary additional parameters,
which are passed directly to the model constructor.
cvat_sdk.auto_annotation.functions.torchvision_keypoint_detection
This AA function is analogous to torchvision_detection
,
except it uses torchvision’s keypoint detection models and produces skeleton annotations.
Keypoints which the model marks as invisible will be marked as occluded in CVAT.
Refer to the previous section for usage instructions and parameter information.