I need to get the bounding box coordinates generated in the above image using YOLO object detection.
-
YOLO also has a ```--save-text``` flag you can set to save the coordinate information for each bounding box to disk. – Ender May 15 '22 at 18:57
-
Relatedly, does anyone know how to get the confidence scores for each bounding box? – Ender May 15 '22 at 18:59
-
@Ender You can check detect.py file and edit it. Look for a function to save prediction image, labels, xyxy, etc. Labels also contain confidence score for each label. – Johnny Dec 07 '22 at 00:49
6 Answers
A quick solution is to modify the image.c file to print out the bounding box information:
...
if(bot > im.h-1) bot = im.h-1;
// Print bounding box values
printf("Bounding Box: Left=%d, Top=%d, Right=%d, Bottom=%d\n", left, top, right, bot);
draw_box_width(im, left, top, right, bot, width, red, green, blue);
...

- 1,836
- 19
- 29
-
5Seriously, thank you so much for suggesting image.c. It helped me solve a totally different problem: When running YOLO in Python (via OpenCV-DNN), the detections are given in a float format. And literally every article I've ever seen has the WRONG MATH for turning the YOLO floats (center X/Y, and width/height) into pixel coordinates. But the official image.c has the math! Right here! https://github.com/pjreddie/darknet/blob/810d7f797bdb2f021dbe65d2524c2ff6b8ab5c8b/src/image.c#L283-L291 - I just had to port that to python. :-) – Mitch McMabers Sep 10 '19 at 19:04
-
@Brian O'Donnell How can I modify the "image.c" to only get four numbers for the coordinates of bounding boxes (without any additional description)? – Max Jun 13 '20 at 16:31
-
Do you just want the numbers? If so you would want: printf("%d,%d,%d,%d\n", left, top, right, bot); – Brian O'Donnell Jun 13 '20 at 19:10
-
@MitchMcMabers Do you know why is there a need to multiply with the width and height? – varungupta Jan 25 '22 at 18:27
-
@varungupta, the bounding box coordinates and dimensions are normalized by dividing by image width and height. – Ender May 15 '22 at 18:58
for python user in windows:
first..., do several setting jobs:
setting python path of your darknet folder in environtment path:
PYTHONPATH = 'YOUR DARKNET FOLDER'
add PYTHONPATH to Path value by add:
%PYTHONPATH%
edit file
coco.data
incfg folder
, by change thenames
folder variable to yourcoco.names
folder, in my case:names = D:/core/darknetAB/data/coco.names
with this setting, you can call darknet.py (from alexeyAB\darknet repository) as your python module from any folder.
start scripting:
from darknet import performDetect as scan #calling 'performDetect' function from darknet.py
def detect(str):
''' this script if you want only want get the coord '''
picpath = str
cfg='D:/core/darknetAB/cfg/yolov3.cfg' #change this if you want use different config
coco='D:/core/darknetAB/cfg/coco.data' #you can change this too
data='D:/core/darknetAB/yolov3.weights' #and this, can be change by you
test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False, initOnly=False) #default format, i prefer only call the result not to produce image to get more performance
#until here you will get some data in default mode from alexeyAB, as explain in module.
#try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))],
#to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):
newdata = []
if len(test) >=2:
for x in test:
item, confidence_rate, imagedata = x
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
newdata.append(data)
elif len(test) == 1:
item, confidence_rate, imagedata = test[0]
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate, (x_start, y_start, x_end, y_end), w_size, h_size)
newdata.append(data)
else:
newdata = False
return newdata
How to use it:
table = 'D:/test/image/test1.jpg'
checking = detect(table)'
to get the coordinate:
if only 1 result:
x1, y1, x2, y2 = checking[2]
if many result:
for x in checking:
item = x[0]
x1, y1, x2, y2 = x[2]
print(item)
print(x1, y1, x2, y2)

- 413
- 1
- 7
- 13
-
The code is untested there is typo in weight_size and height_size. And you should use test[0] to extract item, confidence_rate, imagedata in the single detection. I have commented below with working code. Anyway lots of thanks for your code that helped me kick start. – Saugat Bhattarai Mar 11 '20 at 11:47
-
1yeahh..., sorry for the typo...just try to help and inspirate... btw, already fix the typo....should be work now... Noted: The Newest OpenCV (4.1.1 above) already have Darknet RNN model, so, we can implement darknet, straight in opencv. OpenCV like All in One machine now... – Wahyu Bram Apr 07 '20 at 09:39
If you are going to implement this in python
, there is this small python
wrapper that I have created in here. Follow the ReadMe
file and install it. It will be very easy to install.
After that follow this example code to know how to detect objects.
If your detection is det
top_left_x = det.bbox.x
top_left_y = det.bbox.y
width = det.bbox.w
height = det.bbox.h
If you need, you can get the midpoint by:
mid_x, mid_y = det.bbox.get_point(pyyolo.BBox.Location.MID)
Hope this helps..

- 4,853
- 6
- 46
- 67
Inspired from @Wahyu answer above. There are few changes, modification and bug fixes and tested with single object detection and multiple object detection.
# calling 'performDetect' function from darknet.py
from darknet import performDetect as scan
import math
def detect(img_path):
''' this script if you want only want get the coord '''
picpath = img_path
# change this if you want use different config
cfg = '/home/saggi/Documents/saggi/prabin/darknet/cfg/yolo-obj.cfg'
coco = '/home/saggi/Documents/saggi/prabin/darknet/obj.data' # you can change this too
# and this, can be change by you
data = '/home/saggi/Documents/saggi/prabin/darknet/backup/yolo-obj_last.weights'
test = scan(imagePath=picpath, thresh=0.25, configPath=cfg, weightPath=data, metaPath=coco, showImage=False, makeImageOnly=False,
initOnly=False) # default format, i prefer only call the result not to produce image to get more performance
# until here you will get some data in default mode from alexeyAB, as explain in module.
# try to: help(scan), explain about the result format of process is: [(item_name, convidence_rate (x_center_image, y_center_image, width_size_box, height_size_of_box))],
# to change it with generally used form, like PIL/opencv, do like this below (still in detect function that we create):
newdata = []
# For multiple Detection
if len(test) >= 2:
for x in test:
item, confidence_rate, imagedata = x
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate,
(x_start, y_start, x_end, y_end), (w_size, h_size))
newdata.append(data)
# For Single Detection
elif len(test) == 1:
item, confidence_rate, imagedata = test[0]
x1, y1, w_size, h_size = imagedata
x_start = round(x1 - (w_size/2))
y_start = round(y1 - (h_size/2))
x_end = round(x_start + w_size)
y_end = round(y_start + h_size)
data = (item, confidence_rate,
(x_start, y_start, x_end, y_end), (w_size, h_size))
newdata.append(data)
else:
newdata = False
return newdata
if __name__ == "__main__":
# Multiple detection image test
# table = '/home/saggi/Documents/saggi/prabin/darknet/data/26.jpg'
# Single detection image test
table = '/home/saggi/Documents/saggi/prabin/darknet/data/1.jpg'
detections = detect(table)
# Multiple detection
if len(detections) > 1:
for detection in detections:
print(' ')
print('========================================================')
print(' ')
print('All Parameter of Detection: ', detection)
print(' ')
print('========================================================')
print(' ')
print('Detected label: ', detection[0])
print(' ')
print('========================================================')
print(' ')
print('Detected object Confidence: ', detection[1])
x1, y1, x2, y2 = detection[2]
print(' ')
print('========================================================')
print(' ')
print(
'Detected object top left and bottom right cordinates (x1,y1,x2,y2): x1, y1, x2, y2')
print('x1: ', x1)
print('y1: ', y1)
print('x2: ', x2)
print('y2: ', y2)
print(' ')
print('========================================================')
print(' ')
print('Detected object width and height: ', detection[3])
b_width, b_height = detection[3]
print('Weidth of bounding box: ', math.ceil(b_width))
print('Height of bounding box: ', math.ceil(b_height))
print(' ')
print('========================================================')
# Single detection
else:
print(' ')
print('========================================================')
print(' ')
print('All Parameter of Detection: ', detections)
print(' ')
print('========================================================')
print(' ')
print('Detected label: ', detections[0][0])
print(' ')
print('========================================================')
print(' ')
print('Detected object Confidence: ', detections[0][1])
x1, y1, x2, y2 = detections[0][2]
print(' ')
print('========================================================')
print(' ')
print(
'Detected object top left and bottom right cordinates (x1,y1,x2,y2): x1, y1, x2, y2')
print('x1: ', x1)
print('y1: ', y1)
print('x2: ', x2)
print('y2: ', y2)
print(' ')
print('========================================================')
print(' ')
print('Detected object width and height: ', detections[0][3])
b_width, b_height = detections[0][3]
print('Weidth of bounding box: ', math.ceil(b_width))
print('Height of bounding box: ', math.ceil(b_height))
print(' ')
print('========================================================')
# Single detections output:
# test value [('movie_name', 0.9223029017448425, (206.79859924316406, 245.4672393798828, 384.83673095703125, 72.8630142211914))]
# Multiple detections output:
# test value [('movie_name', 0.9225175976753235, (92.47076416015625, 224.9121551513672, 147.2491912841797, 42.063255310058594)),
# ('movie_name', 0.4900225102901459, (90.5261459350586, 12.4061279296875, 182.5990447998047, 21.261077880859375))]

- 2,614
- 4
- 24
- 33
-
-
@ Pe Dro, read at section in my answer above. there is an explanation how it works, it's still use the anchor, with binding method. and to make it works, need to make some configuration that I already explain in my answer... – Wahyu Bram Aug 30 '20 at 04:39
If the Accepted Answer does not work for you this might be because you are using AlexyAB's darknet model instead of pjreddie's darknet model.
You just need to go to image_opencv.cpp file in the src folder and uncomment the following section:
...
//int b_x_center = (left + right) / 2;
//int b_y_center = (top + bot) / 2;
//int b_width = right - left;
//int b_height = bot - top;
//sprintf(labelstr, "%d x %d - w: %d, h: %d", b_x_center, b_y_center, b_width, b_height);
This will print the Bbox center coordinates as well as the width and height of the Bbox. After making the changes make sure to make
the darknet again before running YOLO.

- 38
- 8
-
Thanks a lot. This worked. But I want to print like: "Bounding box of – Virtuall.Kingg Nov 05 '21 at 05:37
-
` sprintf("Bounding box of %s : %d, %d", labelstr, b_x_center, b_y_center); ` – Hassaan Awan Nov 06 '21 at 06:20
If you are using yolov4
in the darknet
framework (by which I mean the version compiled directly from the GitHub repo https://github.com/AlexeyAB/darknet) to run object detection on static images, something like the following command can be run at the command line to get the bounding box as relative coordinates:
.\darknet.exe detector test .\cfg\coco.data .\cfg\yolov4.cfg .\yolov4.weights -ext_output .\data\people1.jpg -out result.json
Note the above is in the syntax of Windows, so you may have to change the backward slashes into forward slashes for it to work on a macOS or Linux operating system. Also, please make sure the paths are accurate before running. In the command, the input is the people1.jpg
file in the data
directory contained in the root. The output will be stored in a file named result.json
. Feel free to modify this output name but retain the .json
extension to change its name.

- 1,192
- 1
- 15
- 23
-
Is it possible to save the real-time streaming reslut with the certain time interval. For example: 10 seconds. – Virtuall.Kingg Nov 05 '21 at 08:48
-
I think that should be possible by modifying a script similar to this: https://github.com/IdoGalil/People-counting-system/blob/master/yolov3/yolo_detection_model.py – Kris Stern Nov 08 '21 at 02:32