4

I am using Matterport Mask RCNN as my model and I'm trying to build my database for training. After much deliberation over the below problem, I think what I'm actually asking is how do I add more than one class (+ BG)?

I get the following AssertionError:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-21-c20768952b65> in <module>()
     15 
     16   # display image with masks and bounding boxes
---> 17   display_instances(image, bbox, masks, class_ids/4, train_set.class_names)

/usr/local/lib/python3.6/dist-packages/mask_rcnn-2.1-py3.6.egg/mrcnn/visualize.py in display_instances(image, boxes, masks, class_ids, class_names, scores, title, figsize, ax, show_mask, show_bbox, colors, captions)
    103         print("\n*** No instances to display *** \n")
    104     else:
--> 105         assert boxes.shape[0] == masks.shape[-1] == class_ids.shape[0]
    106 
    107     # If no axis is passed, create one and automatically call show()

AssertionError: 

The problem appears to come from this mask.shape[-1] == class_ids.shape[0] resulting in False which should not be the case.

I have now traced it back to the masks.shape[-1] is 4 times the value of the class_id.shape[0] and I think this may have something to do with having 4 classes in the data. Unfortunately, I haven't worked out how to solve this problem.

# load the masks for an image
def load_mask(self, image_id):
  # get details of image
  info = self.image_info[image_id]
  # define box file location
  path = info['annotation']
  # load XML
  boxes, w, h = self.extract_boxes(path)
  # create one array for all masks, each on a different channel
  masks = zeros([h, w, len(boxes)], dtype='uint8')
  # create masks
  class_ids = list()
  for i in range(len(boxes)):
    box = boxes[i]
    row_s, row_e = box[1], box[3]
    col_s, col_e = box[0], box[2]
    masks[row_s:row_e, col_s:col_e, i] = 1
    class_ids.append(self.class_names.index('Resistor'))
    class_ids.append(self.class_names.index('LED'))
    class_ids.append(self.class_names.index('Capacitor'))
    class_ids.append(self.class_names.index('Diode'))
    return masks, asarray(class_ids, dtype='int32')

# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
print(mask, "and", class_ids)

# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)
The Gibbinold
  • 259
  • 3
  • 12
  • 1
    Have you verified that `masks.shape[-1] == class_ids.shape[0]` holds for your inputs? – IonicSolutions Jan 19 '20 at 17:51
  • And please reduce your question to the [mcve] you provided as an update. It will be easier to debug this small example than the full code. – IonicSolutions Jan 19 '20 at 17:51
  • @IonicSolutions Thank you for your response, for your first comment I get ```False```. Apologies for the lengthy code, I will reduce it down (to be honest, I wasn't 100% sure on what part was causing it) – The Gibbinold Jan 19 '20 at 17:59
  • 1
    No need to apologize! Now you know why the assertion fails. You should check which format `display_instances` expects for the `mask` and `class_ids`. – IonicSolutions Jan 19 '20 at 18:34

2 Answers2

3

There are a couple of modifications you need to do to add multiple classes:

1) In load dataset, add classes in self.add_class("class_name"), and, then the last line is modified to add class_ids. #number of classes you have.

# load the dataset definitions
def load_dataset(self, dataset_dir, is_train=True):
    # define one class
    self.add_class("dataset", 1, "car")
    self.add_class("dataset", 2, "rider")
    # define data locations
    images_dir = dataset_dir + '/images_mod/'
    annotations_dir = dataset_dir + '/annots_mod/'
    # find all images
    for filename in listdir(images_dir):
        # extract image id
        image_id = filename[:-4]
        # skip all images after 150 if we are building the train set
        if is_train and int(image_id) >= 3000:
            continue
        # skip all images before 150 if we are building the test/val set
        if not is_train and int(image_id) < 3000:
            continue
        img_path = images_dir + filename
        ann_path = annotations_dir + image_id + '.xml'
        # add to dataset
        self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path, class_ids=[0,1,2])

2) Now, in extract boxes, you need to modify to find the object and then look for name and bounding box dimensions. In case you have 2 classes and your XML files contains those exact classes only then you need no to use the if statement to append co-ordinates to boxes. But if you want to consider less number of classes compared to classes available in XML files, then you need to add if statement. Otherwise, all the boxes will be considered as masks.

# extract bounding boxes from an annotation file
def extract_boxes(self, filename):
    # load and parse the file
    tree = ElementTree.parse(filename)
    # get the root of the document
    root = tree.getroot()
    # extract each bounding box
    boxes = list()

    for box in root.findall('.//object'):
        name = box.find('name').text
        xmin = int(box.find('./bndbox/xmin').text)
        ymin = int(box.find('./bndbox/ymin').text)
        xmax = int(box.find('./bndbox/xmax').text)
        ymax = int(box.find('./bndbox/ymax').text)
        coors = [xmin, ymin, xmax, ymax, name]
        if name=='car' or name=='rider':
            boxes.append(coors)

    # extract image dimensions
    width = int(root.find('.//size/width').text)
    height = int(root.find('.//size/height').text)
    return boxes, width, height 

3) Finally, in the load_mask, if-else statement needs to be added to append the boxes accordingly.

# load the masks for an image
def load_mask(self, image_id):
    # get details of image
    info = self.image_info[image_id]
    # define box file location
    path = info['annotation']
    # load XML
    boxes, w, h = self.extract_boxes(path)
    # create one array for all masks, each on a different channel
    masks = zeros([h, w, len(boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(boxes)):
        box = boxes[i]
        row_s, row_e = box[1], box[3]
        col_s, col_e = box[0], box[2]
        if (box[4] == 'car'):
            masks[row_s:row_e, col_s:col_e, i] = 1
            class_ids.append(self.class_names.index('car'))
        else:
            masks[row_s:row_e, col_s:col_e, i] = 2
            class_ids.append(self.class_names.index('rider'))   
    return masks, asarray(class_ids, dtype='int32')

In my case, I require 2 classes and there are numerous classes available in XML files. Using the above code, I got the following image: annotated_image

Akash Kumar
  • 473
  • 4
  • 12
0

If u want to train multiple classes you can use the following code..

  1. In load dataset, add classes in self.add_class("class_name"), and, then the last line is modified to add class_ids. #number of classes you have.

     # define classes
     self.add_class("dataset", 1, "class1name")
     self.add_class("dataset", 2, "class2name")
     # define data locations
     images_dir = dataset_dir + '/images/'
     annotations_dir = dataset_dir + '/annots/'
     # find all images
     for filename in listdir(images_dir):
         # extract image id
         image_id = filename[:-4]
         # skip bad images
         if image_id in ['00090']:
             continue
         # skip all images after 150 if we are building the train set
         if is_train and int(image_id) >= 150:
             continue
         # skip all images before 150 if we are building the test/val set
         if not is_train and int(image_id) < 150:
             continue
         img_path = images_dir + filename
         ann_path = annotations_dir + image_id + '.xml'
         # add to dataset
         self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path,class_ids=[0,1,2])
    
  2. You don't need to modify anything in below function

     def extract_boxes(self, filename):
         # load and parse the file
         tree = ElementTree.parse(filename)
         # get the root of the document
         root = tree.getroot()
         # extract each bounding box
         boxes = list()
         for box in root.findall('.//bndbox'):
             xmin = int(box.find('xmin').text)
             ymin = int(box.find('ymin').text)
             xmax = int(box.find('xmax').text)
             ymax = int(box.find('ymax').text)
             coors = [xmin, ymin, xmax, ymax]
             boxes.append(coors)
         # extract image dimensions
         width = int(root.find('.//size/width').text)
         height = int(root.find('.//size/height').text)
     return boxes, width, height
    

3)In the below function "if i == 0" means the first bounding boxes.For multiple bounding boxes(i.e for multiple classes) use i == 1,i == 2 .....

    # load the masks for an image
def load_mask(self, image_id):
    # get details of image
    info = self.image_info[image_id]
    # define box file location
    path = info['annotation']
    # load XML
    boxes, w, h = self.extract_boxes(path)
    # create one array for all masks, each on a different channel
    masks = zeros([h, w, len(boxes)], dtype='uint8')
    # create masks
    class_ids = list()
    for i in range(len(boxes)):
        box = boxes[i]
        row_s, row_e = box[1], box[3]
        col_s, col_e = box[0], box[2]
        # print()
        if i == 0:
            masks[row_s:row_e, col_s:col_e, i] = 1
            class_ids.append(self.class_names.index('class1name'))
        else:
            masks[row_s:row_e, col_s:col_e, i] = 2
            class_ids.append(self.class_names.index('class2name'))
    # return boxes[0],masks, asarray(class_ids, dtype='int32') to check the points
    return masks, asarray(class_ids, dtype='int32')
Pramod
  • 161
  • 2
  • 4