Why pre-trained SSD mobilenet V3 model works but not pre-trained SSD mobilenet V2?

Question

Recently I successfully wrote a python script utilizing SSD mobilenet V3 model (2020_01_14) that later was incorporated into the Raspberry Pi to detect objects in real time with voice feedback utilizing an external webcam (piCamera for the Raspberry Pi). However, I have been experimenting with numerous custom object detection models and none seem to integrate well with my code. For instance, I even used the pre-trained SSD mobilenet V2 model (2018_03_29) found https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API and no bounding boxes are being created for some reason. For some reason I don't understand, the code seems to run perfectly well with the V3 model but fails with the V2 model and other custom object models that I have tried (dog-cat classification, mask detection). I believe that if I could figure out why the V2 version is not working, it may help me in creating a custom model that will integrate with my code. I have linked my code and two screenshots (V2 model failure and V3 model success) below. I would appreciate any help in this matter! Also, I would like to highlight that I used Murtaza's channel https://www.youtube.com/watch?v=HXDD7-EnGBY for the visualization of bounding boxes.

import cv2
#import pyttsx3

classNames = []
classFile = 'coco.names' #file with all the 91 different classes of objects
with open(classFile,'rt') as f:
    classNames = [line.rstrip() for line in f]

configPath = 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt'
weightsPath = 'frozen_inference_graph_V2_coco.pb'

net = cv2.dnn_DetectionModel(weightsPath, configPath)
net.setInputSize(320, 320)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)


def getObjects(img, thres, nms, draw=True, objects=[]):
    classIds, confs, bbox = net.detect(img, confThreshold=thres, nmsThreshold=nms)
    if len(objects)==0:
        objects = classNames
    objectInfo = []
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId - 1]
            if className in objects:
                objectInfo.append([box, className])
                if (draw):
                    cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
                    cv2.putText(img, className.upper(), (box[0]+10, box[1]+30),
                        cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
                    cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
                        cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
    return img, objectInfo

if __name__ == "__main__":
    cap = cv2.VideoCapture(0)
    cap.set(3, 640)
    cap.set(4, 480)

    while True:
        success, img = cap.read()
        result, objectInfo = getObjects(img, 0.4, 0.1)
        cv2.imshow('Output', img)
        cv2.waitKey(1)

Why pre-trained SSD mobilenet V3 model works but not pre-trained SSD mobilenet V2?

0 Answers0