How to get the coordinates of the bounding box in YOLO object detection?

Question

I need to get the bounding box coordinates generated in the above image using YOLO object detection.

YOLO also has a ```--save-text``` flag you can set to save the coordinate information for each bounding box to disk. — Ender, May 15 '22 at 18:57
Relatedly, does anyone know how to get the confidence scores for each bounding box? — Ender, May 15 '22 at 18:59
@Ender You can check detect.py file and edit it. Look for a function to save prediction image, labels, xyxy, etc. Labels also contain confidence score for each label. — Johnny, Dec 07 '22 at 00:49

Brian O'Donnell · Accepted Answer · 2017-06-16T15:14:04.750

21

A quick solution is to modify the image.c file to print out the bounding box information:

...
if(bot > im.h-1) bot = im.h-1;

// Print bounding box values 
printf("Bounding Box: Left=%d, Top=%d, Right=%d, Bottom=%d\n", left, top, right, bot); 
draw_box_width(im, left, top, right, bot, width, red, green, blue);
...

edited Jun 16 '17 at 15:14

answered Jun 16 '17 at 14:58

Brian O'Donnell

1,836
19
29

5

Seriously, thank you so much for suggesting image.c. It helped me solve a totally different problem: When running YOLO in Python (via OpenCV-DNN), the detections are given in a float format. And literally every article I've ever seen has the WRONG MATH for turning the YOLO floats (center X/Y, and width/height) into pixel coordinates. But the official image.c has the math! Right here! https://github.com/pjreddie/darknet/blob/810d7f797bdb2f021dbe65d2524c2ff6b8ab5c8b/src/image.c#L283-L291 - I just had to port that to python. :-) – Mitch McMabers Sep 10 '19 at 19:04
@Brian O'Donnell How can I modify the "image.c" to only get four numbers for the coordinates of bounding boxes (without any additional description)? – Max Jun 13 '20 at 16:31
Do you just want the numbers? If so you would want: printf("%d,%d,%d,%d\n", left, top, right, bot); – Brian O'Donnell Jun 13 '20 at 19:10
@MitchMcMabers Do you know why is there a need to multiply with the width and height? – varungupta Jan 25 '22 at 18:27
@varungupta, the bounding box coordinates and dimensions are normalized by dividing by image width and height. – Ender May 15 '22 at 18:58

Wahyu Bram · Answer 2 · 2020-04-07T09:33:59.920

for python user in windows:

first..., do several setting jobs:

setting python path of your darknet folder in environtment path:

PYTHONPATH = 'YOUR DARKNET FOLDER'
add PYTHONPATH to Path value by add:

%PYTHONPATH%
edit file coco.data in cfg folder, by change the names folder variable to your coco.names folder, in my case:

names = D:/core/darknetAB/data/coco.names

with this setting, you can call darknet.py (from alexeyAB\darknet repository) as your python module from any folder.

start scripting:

from darknet import performDetect as scan #calling 'performDetect' function from darknet.py

def detect(str):
    ''' this script if you want only want get the coord '''
    picpath = str
    cfg='D:/core/darknetAB/cfg/yolov3.cfg' #change this if you want use different config
    coco='D:/core/darknetAB/cfg/coco.data' #you can change this too
    data='D:/core/darknetAB/yolov3.weights' #and this, can be change by you
    test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False, initOnly=False) #default format, i prefer only call the result not to produce image to get more performance

    #until here you will get some data in default mode from alexeyAB, as explain in module.
    #try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))], 
    #to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):

    newdata = []
    if len(test) >=2:
        for x in test:
            item, confidence_rate, imagedata = x
            x1, y1, w_size, h_size = imagedata
            x_start = round(x1 - (w_size/2))
            y_start = round(y1 - (h_size/2))
            x_end = round(x_start + w_size)
            y_end = round(y_start + h_size)
            data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
            newdata.append(data)

    elif len(test) == 1:
        item, confidence_rate, imagedata = test[0]
        x1, y1, w_size, h_size = imagedata
        x_start = round(x1 - (w_size/2))
        y_start = round(y1 - (h_size/2))
        x_end = round(x_start + w_size)
        y_end = round(y_start + h_size)
        data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
        newdata.append(data)

    else:
        newdata = False

    return newdata

How to use it:

table = 'D:/test/image/test1.jpg'
checking = detect(table)'

to get the coordinate:

if only 1 result:

x1, y1, x2, y2 = checking[2]

if many result:

for x in checking:
    item = x[0]
    x1, y1, x2, y2 = x[2]
    print(item)
    print(x1, y1, x2, y2)

The code is untested there is typo in weight_size and height_size. And you should use test[0] to extract item, confidence_rate, imagedata in the single detection. I have commented below with working code. Anyway lots of thanks for your code that helped me kick start. — Saugat Bhattarai, Mar 11 '20 at 11:47
yeahh..., sorry for the typo...just try to help and inspirate... btw, already fix the typo....should be work now... Noted: The Newest OpenCV (4.1.1 above) already have Darknet RNN model, so, we can implement darknet, straight in opencv. OpenCV like All in One machine now... — Wahyu Bram, Apr 07 '20 at 09:39

score 2 · Answer 3 · answered Feb 08 '19 at 16:24

If you are going to implement this in python, there is this small python wrapper that I have created in here. Follow the ReadMe file and install it. It will be very easy to install.

After that follow this example code to know how to detect objects.
If your detection is det

top_left_x = det.bbox.x
top_left_y = det.bbox.y
width = det.bbox.w
height = det.bbox.h

If you need, you can get the midpoint by:

mid_x, mid_y = det.bbox.get_point(pyyolo.BBox.Location.MID)

Hope this helps..

Saugat Bhattarai · Answer 4 · 2020-03-11T11:54:24.867

Inspired from @Wahyu answer above. There are few changes, modification and bug fixes and tested with single object detection and multiple object detection.

# calling 'performDetect' function from darknet.py
from darknet import performDetect as scan
import math


def detect(img_path):
    ''' this script if you want only want get the coord '''
    picpath = img_path
    # change this if you want use different config
    cfg = '/home/saggi/Documents/saggi/prabin/darknet/cfg/yolo-obj.cfg'
    coco = '/home/saggi/Documents/saggi/prabin/darknet/obj.data'  # you can change this too
    # and this, can be change by you
    data = '/home/saggi/Documents/saggi/prabin/darknet/backup/yolo-obj_last.weights'
    test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False,
                initOnly=False)  # default format, i prefer only call the result not to produce image to get more performance

    # until here you will get some data in default mode from alexeyAB, as explain in module.
    # try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))],
    # to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):

    newdata = []

    # For multiple Detection
    if len(test) >= 2:
        for x in test:
            item, confidence_rate, imagedata = x
            x1, y1, w_size, h_size = imagedata
            x_start = round(x1 - (w_size/2))
            y_start = round(y1 - (h_size/2))
            x_end = round(x_start + w_size)
            y_end = round(y_start + h_size)
            data = (item, confidence_rate,
                    (x_start, y_start, x_end, y_end), (w_size, h_size))
            newdata.append(data)

    # For Single Detection
    elif len(test) == 1:
        item, confidence_rate, imagedata = test[0]
        x1, y1, w_size, h_size = imagedata
        x_start = round(x1 - (w_size/2))
        y_start = round(y1 - (h_size/2))
        x_end = round(x_start + w_size)
        y_end = round(y_start + h_size)
        data = (item, confidence_rate,
                (x_start, y_start, x_end, y_end), (w_size, h_size))
        newdata.append(data)

    else:
        newdata = False

    return newdata


if __name__ == "__main__":
    # Multiple detection image test
    # table = '/home/saggi/Documents/saggi/prabin/darknet/data/26.jpg'
    # Single detection image test
    table = '/home/saggi/Documents/saggi/prabin/darknet/data/1.jpg'
    detections = detect(table)

    # Multiple detection
    if len(detections) > 1:
        for detection in detections:
            print(' ')
            print('========================================================')
            print(' ')
            print('All Parameter of Detection: ', detection)

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected label: ', detection[0])

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected object Confidence: ', detection[1])

            x1, y1, x2, y2 = detection[2]
            print(' ')
            print('========================================================')
            print(' ')
            print(
                'Detected object top left and bottom right cordinates (x1,y1,x2,y2):  x1, y1, x2, y2')
            print('x1: ', x1)
            print('y1: ', y1)
            print('x2: ', x2)
            print('y2: ', y2)

            print(' ')
            print('========================================================')
            print(' ')
            print('Detected object width and height: ', detection[3])
            b_width, b_height = detection[3]
            print('Weidth of bounding box: ', math.ceil(b_width))
            print('Height of bounding box: ', math.ceil(b_height))
            print(' ')
            print('========================================================')

    # Single detection
    else:
        print(' ')
        print('========================================================')
        print(' ')
        print('All Parameter of Detection: ', detections)

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected label: ', detections[0][0])

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected object Confidence: ', detections[0][1])

        x1, y1, x2, y2 = detections[0][2]
        print(' ')
        print('========================================================')
        print(' ')
        print(
            'Detected object top left and bottom right cordinates (x1,y1,x2,y2):  x1, y1, x2, y2')
        print('x1: ', x1)
        print('y1: ', y1)
        print('x2: ', x2)
        print('y2: ', y2)

        print(' ')
        print('========================================================')
        print(' ')
        print('Detected object width and height: ', detections[0][3])
        b_width, b_height = detections[0][3]
        print('Weidth of bounding box: ', math.ceil(b_width))
        print('Height of bounding box: ', math.ceil(b_height))
        print(' ')
        print('========================================================')

# Single detections output:
# test value  [('movie_name', 0.9223029017448425, (206.79859924316406, 245.4672393798828, 384.83673095703125, 72.8630142211914))]

# Multiple detections output:
# test value  [('movie_name', 0.9225175976753235, (92.47076416015625, 224.9121551513672, 147.2491912841797, 42.063255310058594)),
#  ('movie_name', 0.4900225102901459, (90.5261459350586, 12.4061279296875, 182.5990447998047, 21.261077880859375))]

@ Pe Dro, read at section in my answer above. there is an explanation how it works, it's still use the anchor, with binding method. and to make it works, need to make some configuration that I already explain in my answer... — Wahyu Bram, Aug 30 '20 at 04:39

score 0 · Answer 5 · answered Dec 29 '20 at 14:39

If the Accepted Answer does not work for you this might be because you are using AlexyAB's darknet model instead of pjreddie's darknet model.

You just need to go to image_opencv.cpp file in the src folder and uncomment the following section:

            ...

            //int b_x_center = (left + right) / 2;
            //int b_y_center = (top + bot) / 2;
            //int b_width = right - left;
            //int b_height = bot - top;
            //sprintf(labelstr, "%d x %d - w: %d, h: %d", b_x_center, b_y_center, b_width, b_height);

This will print the Bbox center coordinates as well as the width and height of the Bbox. After making the changes make sure to make the darknet again before running YOLO.

Thanks a lot. This worked. But I want to print like: "Bounding box of — Virtuall.Kingg, Nov 05 '21 at 05:37
` sprintf("Bounding box of %s : %d, %d", labelstr, b_x_center, b_y_center); ` — Hassaan Awan, Nov 06 '21 at 06:20

score 0 · Answer 6 · answered Sep 23 '21 at 06:58

If you are using yolov4 in the darknet framework (by which I mean the version compiled directly from the GitHub repo https://github.com/AlexeyAB/darknet) to run object detection on static images, something like the following command can be run at the command line to get the bounding box as relative coordinates:

.\darknet.exe detector test .\cfg\coco.data .\cfg\yolov4.cfg .\yolov4.weights -ext_output .\data\people1.jpg -out result.json

Note the above is in the syntax of Windows, so you may have to change the backward slashes into forward slashes for it to work on a macOS or Linux operating system. Also, please make sure the paths are accurate before running. In the command, the input is the people1.jpg file in the data directory contained in the root. The output will be stored in a file named result.json. Feel free to modify this output name but retain the .json extension to change its name.

Is it possible to save the real-time streaming reslut with the certain time interval. For example: 10 seconds. — Virtuall.Kingg, Nov 05 '21 at 08:48
I think that should be possible by modifying a script similar to this: https://github.com/IdoGalil/People-counting-system/blob/master/yolov3/yolo_detection_model.py — Kris Stern, Nov 08 '21 at 02:32

How to get the coordinates of the bounding box in YOLO object detection?

6 Answers6

Linked