1

I'm working with Python 3.8.10, OpenCV version 4.3.0 and Cuda 10.2 on Ubuntu 20.04. I generated a weights file with Yolov3 for 23 objects that I want to detect in my images. It all works fine and I can draw beautiful boxes in Python around objects whose detection confidence lies above a certain threshold value.

However, it takes more than half a second to loop through all outputs provided by

outputs = net.forward(outputLayers)

when filtering for results above a certain confidence level.

Here's my loop:

boxes = []
confs = []
class_ids = []

for output in outputs: 
     for detect in output:
            scores = detect[5:]
            class_id = np.argmax(scores)
            conf = scores[class_id]
            if conf > 0.7:
                center_x = int(detect[0] * width)
                center_y = int(detect[1] * height)
                w = int(detect[2] * width)
                h = int(detect[3] * height)
                x = int(center_x - w/2)
                y = int(center_y - h / 2)
                boxes.append([x, y, w, h])
                confs.append(float(conf))
                class_ids.append(class_id)

The reason why it takes so long is due to the size of outputs. It seems like all possible detections, no matter of confidence, are returned when calling net.forward(outputLayers). In my case, these are more than 30000 elements that I have to loop through.

Is there any way to throw out detections below a certain confidence level while the model still resides on the GPU? net.forward() doesn't seem to allow any filtering, as far as I could find out. Any ideas would be highly appreciated!

Philipp
  • 21
  • 2
  • is `outputs` a numpy array, or is it a python list, or something else? – Christoph Rackwitz Sep 23 '21 at 15:00
  • It's a numpy array. – Philipp Sep 23 '21 at 16:45
  • what's the exact shape of that? you can remove those two loops with a few expressions that filter everything. then you'll also be able to get rid of the `append` stuff, and do those calculations as a whole as well. `scores = outputs[:,:,5:]; mask = (scores.max(axis=2) > 0.7)` (perhaps with an argmax in between to calculate that once, then some indexing) – Christoph Rackwitz Sep 23 '21 at 17:11
  • Thank you Christoph, your method helped me solve my issue. – Philipp Sep 24 '21 at 16:04

2 Answers2

1

I couldn't find a way to reduce the number of outputs of net.forward(), but the comment by Christoph Rackwitz provided me with a very satisfactory way of speeding up my code. Instead of looping through the output numpy array, I applied:

mask = (outputs[:,5:].max(axis=1) > 0.7)
outputs = outputs[mask]

which reduced the size of my outputs from around 30000 to 33 in 3.8-06 seconds.

Jeremy Caney
  • 7,102
  • 69
  • 48
  • 77
Philipp
  • 21
  • 2
0

To improve your performance you can try to detect only the 23 objects that you want with the net.forward(..) without detecting all the 80 objects that YoloV3 with coco.names detector provide.

If you want to detect only 23 specific objects with YoloV3 list there's a specific section of the darkflow repo that explains how to change the output.

note: you should retrain your model. They show this by taking up an example of 3 classes.

I believe the answer here will be more helpful but instead 1 specific class, just adjust it to 23 objects according to steps.

Roy Amoyal
  • 717
  • 4
  • 14
  • Thank you for your answer! My model is already custom-trained with only 23 classes, so I think that the problem lies with the output of net.forward(). It seems to output ALL possible boxes, without regard to their confidence level. – Philipp Sep 24 '21 at 07:05
  • @Philipp did you try to reduce the size of your image before the process? like the second answer here: https://stackoverflow.com/questions/54488986/how-to-improve-performance-net-forward-of-cv2-dnn-readnetfromcaffe-net-for – Roy Amoyal Sep 24 '21 at 08:32
  • yes, I saw the answer and tried that. My problem is a little different, though. The performance of net.forward() is pretty good, it takes only around .07 seconds to compute. The problem is the lengthy output of approx. 30000 boxes that I have to loop through afterwards in order to only get the boxes with confidence > confidence_level (0.7 in my case). – Philipp Sep 24 '21 at 09:10
  • @Philipp In that case, if speed performance is something that you really care about, I would recommend you to write an equivalent program in C++. The nested for loops should be much faster. I have no other solution right now.. – Roy Amoyal Sep 24 '21 at 09:21