Recently I successfully wrote a python script utilizing SSD mobilenet V3 model (2020_01_14) that later was incorporated into the Raspberry Pi to detect objects in real time with voice feedback utilizing an external webcam (piCamera for the Raspberry Pi). However, I have been experimenting with numerous custom object detection models and none seem to integrate well with my code. For instance, I even used the pre-trained SSD mobilenet V2 model (2018_03_29) found https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API and no bounding boxes are being created for some reason. For some reason I don't understand, the code seems to run perfectly well with the V3 model but fails with the V2 model and other custom object models that I have tried (dog-cat classification, mask detection). I believe that if I could figure out why the V2 version is not working, it may help me in creating a custom model that will integrate with my code. I have linked my code and two screenshots (V2 model failure and V3 model success) below. I would appreciate any help in this matter! Also, I would like to highlight that I used Murtaza's channel https://www.youtube.com/watch?v=HXDD7-EnGBY for the visualization of bounding boxes.
import cv2
#import pyttsx3
classNames = []
classFile = 'coco.names' #file with all the 91 different classes of objects
with open(classFile,'rt') as f:
classNames = [line.rstrip() for line in f]
configPath = 'ssd_mobilenet_v2_coco_2018_03_29.pbtxt'
weightsPath = 'frozen_inference_graph_V2_coco.pb'
net = cv2.dnn_DetectionModel(weightsPath, configPath)
net.setInputSize(320, 320)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)
def getObjects(img, thres, nms, draw=True, objects=[]):
classIds, confs, bbox = net.detect(img, confThreshold=thres, nmsThreshold=nms)
if len(objects)==0:
objects = classNames
objectInfo = []
if len(classIds) != 0:
for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
className = classNames[classId - 1]
if className in objects:
objectInfo.append([box, className])
if (draw):
cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
cv2.putText(img, className.upper(), (box[0]+10, box[1]+30),
cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
return img, objectInfo
if __name__ == "__main__":
cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)
while True:
success, img = cap.read()
result, objectInfo = getObjects(img, 0.4, 0.1)
cv2.imshow('Output', img)
cv2.waitKey(1)