Adding multiple classes in Mask R-CNN

Question

I am using Matterport Mask RCNN as my model and I'm trying to build my database for training. After much deliberation over the below problem, I think what I'm actually asking is how do I add more than one class (+ BG)?

I get the following AssertionError:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-21-c20768952b65> in <module>()
     15 
     16   # display image with masks and bounding boxes
---> 17   display_instances(image, bbox, masks, class_ids/4, train_set.class_names)

/usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/visualize.py in display_instances(image, boxes, masks, class_ids, class_names, scores, title, figsize, ax, show_mask, show_bbox, colors, captions)
    103         print("\n*** No instances to display *** \n")
    104     else:
--> 105         assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
    106 
    107     # If no axis is passed, create one and automatically call show()

AssertionError:

The problem appears to come from this mask.shape[-1] == class_ids.shape[0] resulting in False which should not be the case.

I have now traced it back to the masks.shape[-1] is 4 times the value of the class_id.shape[0] and I think this may have something to do with having 4 classes in the data. Unfortunately, I haven't worked out how to solve this problem.

# load the masks for an image
def load_mask(self, image_id):
  # get details of image
  info = self.image_info[image_id]
  # define box file location
  path = info['annotation']
  # load XML
  boxes, w, h = self.extract_boxes(path)
  # create one array for all masks, each on a different channel
  masks = zeros([h, w, len(boxes)], dtype='uint8')
  # create masks
  class_ids = list()
  for i in range(len(boxes)):
    box = boxes[i]
    row_s, row_e = box[1], box[3]
    col_s, col_e = box[0], box[2]
    masks[row_s:row_e, col_s:col_e, i] = 1
    class_ids.append(self.class_names.index('Resistor'))
    class_ids.append(self.class_names.index('LED'))
    class_ids.append(self.class_names.index('Capacitor'))
    class_ids.append(self.class_names.index('Diode'))
    return masks, asarray(class_ids, dtype='int32')

# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
print(mask, "and", class_ids)

# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)

Have you verified that `masks.shape[-1] == class_ids.shape[0]` holds for your inputs? — IonicSolutions, Jan 19 '20 at 17:51
And please reduce your question to the [mcve] you provided as an update. It will be easier to debug this small example than the full code. — IonicSolutions, Jan 19 '20 at 17:51
@IonicSolutions Thank you for your response, for your first comment I get ```False```. Apologies for the lengthy code, I will reduce it down (to be honest, I wasn't 100% sure on what part was causing it) — The Gibbinold, Jan 19 '20 at 17:59
No need to apologize! Now you know why the assertion fails. You should check which format `display_instances` expects for the `mask` and `class_ids`. — IonicSolutions, Jan 19 '20 at 18:34

score 3 · Answer 1 · answered Apr 28 '20 at 17:09

There are a couple of modifications you need to do to add multiple classes:

1) In load dataset, add classes in self.add_class("class_name"), and, then the last line is modified to add class_ids. #number of classes you have.

# load the dataset definitions
def load_dataset(self, dataset_dir, is_train=True):
    # define one class
    self.add_class("dataset", 1, "car")
    self.add_class("dataset", 2, "rider")
    # define data locations
    images_dir = dataset_dir + '/images_mod/'
    annotations_dir = dataset_dir + '/annots_mod/'
    # find all images
    for filename in listdir(images_dir):
        # extract image id
        image_id = filename[:-4]
        # skip all images after 150 if we are building the train set
        if is_train and int(image_id) >= 3000:
            continue
        # skip all images before 150 if we are building the test/val set
        if not is_train and int(image_id) < 3000:
            continue
        img_path = images_dir + filename
        ann_path = annotations_dir + image_id + '.xml'
        # add to dataset
        self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path, class_ids=[0,1,2])

2) Now, in extract boxes, you need to modify to find the object and then look for name and bounding box dimensions. In case you have 2 classes and your XML files contains those exact classes only then you need no to use the if statement to append co-ordinates to boxes. But if you want to consider less number of classes compared to classes available in XML files, then you need to add if statement. Otherwise, all the boxes will be considered as masks.

# extract bounding boxes from an annotation file
def extract_boxes(self, filename):
    # load and parse the file
    tree = ElementTree.parse(filename)
    # get the root of the document
    root = tree.getroot()
    # extract each bounding box
    boxes = list()

    for box in root.findall('.//object'):
        name = box.find('name').text
        xmin = int(box.find('./bndbox/xmin').text)
        ymin = int(box.find('./bndbox/ymin').text)
        xmax = int(box.find('./bndbox/xmax').text)
        ymax = int(box.find('./bndbox/ymax').text)
        coors = [xmin, ymin, xmax, ymax, name]
        if name=='car' or name=='rider':
            boxes.append(coors)

    # extract image dimensions
    width = int(root.find('.//size/width').text)
    height = int(root.find('.//size/height').text)
    return boxes, width, height

3) Finally, in the load_mask, if-else statement needs to be added to append the boxes accordingly.

# load the masks for an image
def load_mask(self, image_id):
    # get details of image
    info = self.image_info[image_id]
    # define box file location
    path = info['annotation']
    # load XML
    boxes, w, h = self.extract_boxes(path)
    # create one array for all masks, each on a different channel
    masks = zeros([h, w, len(boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(boxes)):
        box = boxes[i]
        row_s, row_e = box[1], box[3]
        col_s, col_e = box[0], box[2]
        if (box[4] == 'car'):
            masks[row_s:row_e, col_s:col_e, i] = 1
            class_ids.append(self.class_names.index('car'))
        else:
            masks[row_s:row_e, col_s:col_e, i] = 2
            class_ids.append(self.class_names.index('rider'))   
    return masks, asarray(class_ids, dtype='int32')

In my case, I require 2 classes and there are numerous classes available in XML files. Using the above code, I got the following image:

Pramod · Answer 2 · 2020-08-06T05:17:26.940

If u want to train multiple classes you can use the following code..

In load dataset, add classes in self.add_class("class_name"), and, then the last line is modified to add class_ids. #number of classes you have.

 # define classes
 self.add_class("dataset", 1, "class1name")
 self.add_class("dataset", 2, "class2name")
 # define data locations
 images_dir = dataset_dir + '/images/'
 annotations_dir = dataset_dir + '/annots/'
 # find all images
 for filename in listdir(images_dir):
     # extract image id
     image_id = filename[:-4]
     # skip bad images
     if image_id in ['00090']:
         continue
     # skip all images after 150 if we are building the train set
     if is_train and int(image_id) >= 150:
         continue
     # skip all images before 150 if we are building the test/val set
     if not is_train and int(image_id) < 150:
         continue
     img_path = images_dir + filename
     ann_path = annotations_dir + image_id + '.xml'
     # add to dataset
     self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path,class_ids=[0,1,2])

You don't need to modify anything in below function

 def extract_boxes(self, filename):
     # load and parse the file
     tree = ElementTree.parse(filename)
     # get the root of the document
     root = tree.getroot()
     # extract each bounding box
     boxes = list()
     for box in root.findall('.//bndbox'):
         xmin = int(box.find('xmin').text)
         ymin = int(box.find('ymin').text)
         xmax = int(box.find('xmax').text)
         ymax = int(box.find('ymax').text)
         coors = [xmin, ymin, xmax, ymax]
         boxes.append(coors)
     # extract image dimensions
     width = int(root.find('.//size/width').text)
     height = int(root.find('.//size/height').text)
 return boxes, width, height

3)In the below function "if i == 0" means the first bounding boxes.For multiple bounding boxes(i.e for multiple classes) use i == 1,i == 2 .....

    # load the masks for an image
def load_mask(self, image_id):
    # get details of image
    info = self.image_info[image_id]
    # define box file location
    path = info['annotation']
    # load XML
    boxes, w, h = self.extract_boxes(path)
    # create one array for all masks, each on a different channel
    masks = zeros([h, w, len(boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(boxes)):
        box = boxes[i]
        row_s, row_e = box[1], box[3]
        col_s, col_e = box[0], box[2]
        # print()
        if i == 0:
            masks[row_s:row_e, col_s:col_e, i] = 1
            class_ids.append(self.class_names.index('class1name'))
        else:
            masks[row_s:row_e, col_s:col_e, i] = 2
            class_ids.append(self.class_names.index('class2name'))
    # return boxes[0],masks, asarray(class_ids, dtype='int32') to check the points
    return masks, asarray(class_ids, dtype='int32')

Adding multiple classes in Mask R-CNN

2 Answers2