Uploading large dataset from FiftyOne to CVAT

Question

I'm trying to upload around 15GB of data from FiftyOne to CVAT using the 'annotate' function in order to fix annotations. The task is divided into jobs of 50 samples. During the sample upload, I get an 'Error 504 Gateway Time-Out' error. I can see the images in CVAT but they are without the current annotations. Tried uploading the annotations separately using the 'task_id' and changing the 'cvat.py' file in FiftyOne but I wasn't able to load the changed annotations.

I can't break this down into multiple tasks since all tasks have the same name making it inconvenient. In order to be able to use 'load_annotations' to update the dataset, I understand that I have to upload it using the 'annotate' function (unless there is another way).

Eric Hofesmann · Answer 1 · 2021-12-02T14:47:17.857

Update: This seems to be a limitation of CVAT on the maximum size of requests to their API. In order to circumvent this for the time being, we are adding a task_size parameter to the annotate() method of FiftyOne which automatically breaks an annotation run into multiple tasks of a maximum task_size to avoid large data or annotation uploads.

Previous Answer:

The best way to manage this workflow now would be to break down your annotations into multiple tasks but then upload them to one CVAT project to be able to group and manage them nicely.

For example:

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart").clone()


# The label schema is automatically inferred from the existing labels
# Alternatively, it can be specified with the `label_schema` kwarg 
# when calling `annotate()`

label_field = "ground_truth"


# Upload batches of your dataset to different tasks
# all stored in the same project

project_name = "multiple_task_example"
anno_keys = []

for i in range(int(len(dataset)/50)):
    anno_key = "example_%d" % i
    view = dataset.skip(i*50).limit(50)

    view.annotate(
        anno_key,
        label_field=label_field,
        project_name=project_name,
    )
    anno_keys.append(anno_key)


# Annotate in CVAT...


# Load all annotations and cleanup tasks/project when complete
anno_keys = dataset.list_annotation_runs()  
for anno_key in anno_keys:
    dataset.load_annotations(anno_key, cleanup=True)
    dataset.delete_annotation_run(anno_key)

Uploading to existing tasks and the project_name argument will be available in the next release. If you want to use this immediately you can install FiftyOne from source: https://github.com/voxel51/fiftyone#installing-from-source

We are working on further optimizations and stability improvements for large CVAT annotation jobs like yours.

Uploading large dataset from FiftyOne to CVAT

1 Answers1

Linked