20

I'm training a YOLO model, I have the bounding boxes in this format:-

x1, y1, x2, y2 => ex (100, 100, 200, 200)

I need to convert it to YOLO format to be something like:-

X, Y, W, H => 0.436262 0.474010 0.383663 0.178218

I already calculated the center point X, Y, the height H, and the weight W. But still need a away to convert them to floating numbers as mentioned.

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
Ahmed Fayez
  • 211
  • 1
  • 2
  • 4

8 Answers8

29

for those looking for the reverse of the question (yolo format to normal bbox format)

def yolobbox2bbox(x,y,w,h):
    x1, y1 = x-w/2, y-h/2
    x2, y2 = x+w/2, y+h/2
    return x1, y1, x2, y2
FarisHijazi
  • 558
  • 4
  • 9
  • don't you need the total size for this? – Anderas Oct 27 '21 at 09:55
  • 2
    no you don't, you're only converting different formats. Converting meters to inches doesn't need you to know the full size of the house, you just run the equation – FarisHijazi Oct 27 '21 at 15:25
  • Your equation and the fact that you put it here saved me 15 minutes yesterday, thanks a lot, and for that I also upvoted it. Even if I had to add the multiplication with the size, because converting back to pixel coordinates would very well need the size. 0.4 in a 500px image is x=200. 0.4 in a 1000 pixel image is x=400. If you're not converting back to a pixel based format, it would probably be good to mention that in the posting. – Anderas Oct 28 '21 at 06:57
  • actually there's no need for multiplying to convert to pixel coordinates, but you probably do need to round it. in the example: `yolobbox2bbox(5,5,2,2): output:(4.0, 4.0, 6.0, 6.0)`. which is exactly in pixel dimensions. Check your input to this function, if the largest value is 1, then that's why you needed to multiply, this function is generic and takes pixel coordinates and returns pixel coordinates, or takes scaled coordinates (0,1) and returns scaled coordinates. You could scale it before or after. you shouldn't need to multiply if the input is pixels. – FarisHijazi Oct 28 '21 at 15:50
21

Here's code snipet in python to convert x,y coordinates to yolo format

def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)

im=Image.open(img_path)
w= int(im.size[0])
h= int(im.size[1])


print(xmin, xmax, ymin, ymax) #define your x,y coordinates
b = (xmin, xmax, ymin, ymax)
bb = convert((w,h), b)

Check my sample program to convert from LabelMe annotation tool format to Yolo format https://github.com/ivder/LabelMeYoloConverter

Matt Popovich
  • 236
  • 1
  • 2
  • 14
gameon67
  • 3,981
  • 5
  • 35
  • 61
  • Doesn't this convert it to the center-normalised coordinates? Is this the same as the YOLO bounding box encoding which is relative to the grid cell?? – Lxrd-AJ Jan 13 '20 at 21:21
  • 1
    @Lxrd-AJ it's relative to the grid cell when you perform detection. This format is for the training data – gameon67 Jan 14 '20 at 00:38
  • I think this is wrong. `convert` returns the coordinates in center-normalised coordinates. This is relative to the entire image and not the grid cells. To make it relative to the grid cells, you need to multiply (7 * center_x) - floor(7 * center_x), assuming a grid size of 7 – Lxrd-AJ Jan 15 '20 at 10:15
  • 2
    @Lxrd-AJ I already told you that you don't have to make the coordinate relative to the grid cells when you prepare annotation on your dataset. Could you give me a link or source that tell you that you have to calculate the coord related to grid cell when ANNOTATING training data, not when training or during inference? – gameon67 Jan 15 '20 at 11:33
  • What is size in the convert function? – Akki Jan 21 '21 at 08:26
  • @Akki it is `(w,h)` – gameon67 Jan 21 '21 at 08:30
  • 2
    warning for others, the question asks `(x1, y1, x2, y2)` while the answer provided is in `(xmin, xmax, ymin, ymax)`, so please adapt accordingly – Joseph Adam Mar 09 '21 at 21:07
8

There is a more straight-forward way to do those stuff with pybboxes. Install with,

pip install pybboxes

use it as below,

import pybboxes as pbx

voc_bbox = (100, 100, 200, 200)
W, H = 1000, 1000  # WxH of the image
pbx.convert_bbox(voc_bbox, from_type="voc", to_type="yolo", image_size=(W,H))
>>> (0.15, 0.15, 0.1, 0.1)

Note that, converting to YOLO format requires the image width and height for scaling.

null
  • 1,944
  • 1
  • 14
  • 24
3

YOLO normalises the image space to run from 0 to 1 in both x and y directions. To convert between your (x, y) coordinates and yolo (u, v) coordinates you need to transform your data as u = x / XMAX and y = y / YMAX where XMAX, YMAX are the maximum coordinates for the image array you are using.

This all depends on the image arrays being oriented the same way.

Here is a C function to perform the conversion

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <math.h>

struct yolo {
    float   u;
    float   v;
    };

struct yolo
convert (unsigned int x, unsigned int y, unsigned int XMAX, unsigned int YMAX)
{
    struct yolo point;

    if (XMAX && YMAX && (x <= XMAX) && (y <= YMAX))
    {
        point.u = (float)x / (float)XMAX;
        point.v = (float)y / (float)YMAX;
    }
    else
    {
        point.u = INFINITY;
        point.v = INFINITY;
        errno = ERANGE;
    }

    return point;
}/* convert */


int main()
{
    struct yolo P;

    P = convert (99, 201, 255, 324);

    printf ("Yolo coordinate = <%f, %f>\n", P.u, P.v);

    exit (EXIT_SUCCESS);
}/* main */
Matt Popovich
  • 236
  • 1
  • 2
  • 14
Jon Guiton
  • 1,360
  • 1
  • 9
  • 11
1

There are two potential solutions. First of all you have to understand if your first bounding box is in the format of Coco or Pascal_VOC. Otherwise you can't do the right math.

Here is the formatting;

Coco Format: [x_min, y_min, width, height]
Pascal_VOC Format: [x_min, y_min, x_max, y_max]

Here are some Python Code how you can do the conversion:

Converting Coco to Yolo

# Convert Coco bb to Yolo
def coco_to_yolo(x1, y1, w, h, image_w, image_h):
    return [((2*x1 + w)/(2*image_w)) , ((2*y1 + h)/(2*image_h)), w/image_w, h/image_h]

Converting Pascal_voc to Yolo

# Convert Pascal_Voc bb to Yolo
def pascal_voc_to_yolo(x1, y1, x2, y2, image_w, image_h):
    return [((x2 + x1)/(2*image_w)), ((y2 + y1)/(2*image_h)), (x2 - x1)/image_w, (y2 - y1)/image_h]

If need additional conversions you can check my article at Medium: https://christianbernecker.medium.com/convert-bounding-boxes-from-coco-to-pascal-voc-to-yolo-and-back-660dc6178742

0

For yolo format to x1,y1, x2,y2 format

def yolobbox2bbox(x,y,w,h):
    x1 = int((x - w / 2) * dw)
    x2 = int((x + w / 2) * dw)
    y1 = int((y - h / 2) * dh)
    y2 = int((y + h / 2) * dh)

    if x1 < 0:
        x1 = 0
    if x2 > dw - 1:
        x2 = dw - 1
    if y1 < 0:
        y1 = 0
    if y2 > dh - 1:
        y2 = dh - 1

return x1, y1, x2, y2

Matt Popovich
  • 236
  • 1
  • 2
  • 14
payal
  • 1
  • 1
0

There are two things you need to do:

  1. Divide the coordinates by the image size to normalize them to [0..1] range.
  2. Convert (x1, y1, x2, y2) coordinates to (center_x, center_y, width, height).

If you're using PyTorch, Torchvision provides a function that you can use for the conversion:

from torch import tensor
from torchvision.ops import box_convert

image_size = tensor([608, 608])
boxes = tensor([[100, 100, 200, 200], [300, 300, 400, 400]], dtype=float)
boxes[:, :2] /= image_size
boxes[:, 2:] /= image_size
boxes = box_convert(boxes, "xyxy", "cxcywh")
Seppo Enarvi
  • 3,219
  • 3
  • 32
  • 25
-1

Just reading the answers I am also looking for this but find this more informative to know what happening at the backend. Form Here: Source

Assuming x/ymin and x/ymax are your bounding corners, top left and bottom right respectively. Then:

x = xmin
y = ymin
w = xmax - xmin
h = ymax - ymin

You then need to normalize these, which means give them as a proportion of the whole image, so simple divide each value by its respective size from the values above:

x = xmin / width
y = ymin / height
w = (xmax - xmin) / width
h = (ymax - ymin) / height

This assumes a top-left origin, you will have to apply a shift factor if this is not the case.

So the answer

Engr Ali
  • 409
  • 1
  • 5
  • 13