2

I'm new to deep learning. Currently, I am doing a project to detect cars in aerial imagery using the Retinanet model for that I have planned to use COWC Dataset. I have doubt in the annotation part, for now I am using labelImg annotation tool to annotate cars in aerial images. Since labelImg generates annotation in xml format I have converted that in a format required by Retinanet model that is mentioned below.

(imagename) (bounding_box_coordinates) (class_name)

Is there any other way to make annotation easier in COWC dataset?

Thanks in advance:)

1 Answers1

3

The COWC dataset comes with annotations where each cars is labeled with a single point. A PNG file contains the annotations. Here's how I find the annotation locations in the PNG file.

import numpy as np
from PIL import Image

annotation_path = 'cowc/datasets/ground_truth_sets/Toronto_ISPRS/03553_Annotated_Cars.png'
im = Image.open(annotation_path)
data = np.asarray(im)

The problem here is that both of these values will be indexed as nonzero but we only need one of them. The COWC dataset marks cars with a red dot and negative with a blue dot, we don't need the alpha channel so the new array needs be sliced so that we don't count the alpha channel and get duplicate index values.

data = data[:,:,0:3]
y_ind, x_ind, rgba_ind = data.nonzero()

You now have an index to all the points in the annotation file. y_ind corresponds to the height dimension, x_ind to the width. This means at the first x, y position we should see an array that looks like this [255, 0, 0]. This is what I get when I look up the first x, y position from the index

>>> data[y_ind[0], x_ind[0]]
array([255,   0,   0], dtype=uint8)

Here the author decides to create a bounding box that is 20 pixels on a side centered on the annotation provided in the dataset. To create a single bounding box for the first annotation in this image you can try this.

# define bbox given x, y and ensure bbox is within image bounds
def get_bbox(x, y, x_max, y_max):
    x1 = max(0, x - 20)     # returns zero if x-20 is negative
    x2 = min(x_max, x + 20) # returns x_max if x+20 is greater than x_max
    y1 = max(0, y - 20)
    y2 = min(y_max, y + 20)
    return x1, y1, x2, y2

x1, y1, x2, y2 = get_bbox(x_ind[0], y_ind[0], im.width, im.height) 

You'll have to loop through all the x, y values to make all the bounding boxes for the image. Here's a rough and dirty way to create a csv file for a single image.

img_path = 'cowc/datasets/ground_truth_sets/Toronto_ISPRS/03553.png'
with open('anno.csv', 'w') as f:
    for x, y in zip(x_ind, y_ind):
        x1, y1, x2, y2 = get_bbox(x, y, im.width, im.height)
        line = f'{img_path},{x1},{y1},{x2},{y2},car\n'
        f.write(line)

I plan on breaking up a huge image into much smaller ones which will change the values of the bounding boxes. I hope you find this helpful and like a good place to start.

fizix137
  • 110
  • 6
  • I'll need to update the answer. I've just realized that because the rgba array has two values, that pixel will get counted twice. I'll let you know when I've updated the answer with the fix. – fizix137 Jun 17 '20 at 14:23
  • The post has been updated so that you don't get duplicate index values from the color and the alpha channels when running the np.nonzero() method. – fizix137 Jun 17 '20 at 14:38