4

I want to train a model that detects vehicles and roads in an image. I will use Mask R-CNN and YOLACT++ for that purpose. I labelled some of my images for Mask R-CNN with vgg image annotator and the segmentation points look like in the image below.

enter image description here

As you can see, there is not an area parameter or bbox parameter. I can find the bbox of my instances with minx miny maxx maxy but I couldn't find how to find the area of that segmented area. You can see the Yolact annotation formation in the image below.

enter image description here

It takes tons of time to label all instances. I spent a minimum 10 min while labelling all cars in an image and I already have 500 images that are labelled. Do you have any advice for me or idea that can help me to save my time while converting first annotation formation to the second one (mask r-cnn to coco(yolact))?

3 Answers3

2

You must create your own script and transform it, I had to do it from xml annotations to json maskrcnn.

You can check the example: https://github.com/adions025/XMLtoJson_Mask_RCNN

Adonis González
  • 1,978
  • 1
  • 8
  • 14
2

Something like this but it depends on how you annotate in vgg

def vgg_to_coco(vgg_path: str, outfile: str=None, class_keyword: str = "Class"):
    with open(vgg_path) as f:
        vgg = json.load(f)

    images_ids_dict = {v["filename"]: i for i, v in enumerate(vgg.values())}
    # TDOD fix
    images_info = [{"file_name": k, "id": v, "width": 1024, "height": 1024} for k, v in images_ids_dict.items()]

    classes = {class_keyword} | {r["region_attributes"][class_keyword] for v in vgg.values() for r in v["regions"]
                                 if class_keyword in r["region_attributes"]}
    category_ids_dict = {c: i for i, c in enumerate(classes, 1)}
    categories = [{"supercategory": class_keyword, "id": v, "name": k} for k, v in category_ids_dict.items()]
    annotations = []
    suffix_zeros = math.ceil(math.log10(len(vgg)))
    for i, v in enumerate(vgg.values()):
        for j, r in enumerate(v["regions"]):
            if class_keyword in r["region_attributes"]:
                x, y = r["shape_attributes"]["all_points_x"], r["shape_attributes"]["all_points_y"]
                annotations.append({
                    "segmentation": [list(chain.from_iterable(zip(x, y)))],
                    "area": helper.polygon_area(x, y),
                    "bbox": helper.bbox(x, y, out_format="width_height"),
                    "image_id": images_ids_dict[v["filename"]],
                    "category_id": category_ids_dict[r["region_attributes"][class_keyword]],
                    "id": int(f"{i:0>{suffix_zeros}}{j:0>{suffix_zeros}}"),
                    "iscrowd": 0
                })

    coco = {
        "images": images_info,
        "categories": categories,
        "annotations": annotations
    }
    if outfile is None:
        outfile = vgg_path.replace(".json", "_coco.json")
    with open(outfile, "w") as f:
        json.dump(coco, f)

you will have to change the 1024s to your image sizes or if you have a variable image size you will have to create a map for that.

Zac Todd
  • 113
  • 1
  • 11
  • Have made conversions for a variety of different object detection formats https://github.com/zactodd/dataset_maker/blob/main/dataset_maker/annotations/localisation.py – Zac Todd Dec 29 '20 at 04:01
1

Working solution: Extended from @Zac Tod's answer

The image size can be computed on the go.

import skimage 
import math
from itertools import chain
import numpy as np

def vgg_to_coco(dataset_dir, vgg_path: str, outfile: str=None, class_keyword: str = "label"):
    with open(vgg_path) as f:
        vgg = json.load(f)

    images_ids_dict = {}
    images_info = []
    for i,v in enumerate(vgg.values()):

        images_ids_dict[v["filename"]] = i
        image_path = os.path.join(dataset_dir, v['filename'])
        image = skimage.io.imread(image_path)
        height, width = image.shape[:2]  
        images_info.append({"file_name": v["filename"], "id": i, "width": width, "height": height})

    classes = {class_keyword} | {r["region_attributes"][class_keyword] for v in vgg.values() for r in v["regions"].values()
                             if class_keyword in r["region_attributes"]}
    category_ids_dict = {c: i for i, c in enumerate(classes, 1)}
    categories = [{"supercategory": class_keyword, "id": v, "name": k} for k, v in category_ids_dict.items()]
    annotations = []
    suffix_zeros = math.ceil(math.log10(len(vgg)))
    for i, v in enumerate(vgg.values()):
        for j, r in enumerate(v["regions"].values()):
            if class_keyword in r["region_attributes"]:
                x, y = r["shape_attributes"]["all_points_x"], r["shape_attributes"]["all_points_y"]
                annotations.append({
                    "segmentation": [list(chain.from_iterable(zip(x, y)))],
                    "area": PolyArea(x, y),
                    "bbox": [min(x), min(y), max(x)-min(x), max(y)-min(y)],
                    "image_id": images_ids_dict[v["filename"]],
                    "category_id": category_ids_dict[r["region_attributes"][class_keyword]],
                    "id": int(f"{i:0>{suffix_zeros}}{j:0>{suffix_zeros}}"),
                    "iscrowd": 0
                    })

    coco = {
        "images": images_info,
        "categories": categories,
        "annotations": annotations
        }
    if outfile is None:
        outfile = vgg_path.replace(".json", "_coco.json")
    with open(outfile, "w") as f:
        json.dump(coco, f)

My data was labeled using makesense.ai and region_attributes is like this so class_keyword="label" in function call.

"region_attributes": {
      "label": "box"
    }

To compute polygon area, Code is copied from this answer

def PolyArea(x,y):
    return 0.5*np.abs(np.dot(x,np.roll(y,1))-np.dot(y,np.roll(x,1)))
  • If you can, please edit your code, `images_ids_dict[v["filename"]] = I` should be `i`, `v["regions"].values()` should be `v["regions"]` as it is already a list. – theProcrastinator Jun 10 '22 at 10:44