28

I am trying to train custom object classifier in Darknet YOLO v2 https://pjreddie.com/darknet/yolo/

I gathered a dataset for images most of them are 6000 x 4000 px and some lower resolutions as well.

Do I need to resize the images before training to be squared ?

I found that the config uses:

[net]
batch=64
subdivisions=8
height=416
width=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

thats why I was wondering how to use it for different sizes of data sets.

Farahats9
  • 535
  • 1
  • 9
  • 22

5 Answers5

34

You don't have to resize it, because Darknet will do it instead of you!

It means you really don't need to do that and you can use different image sizes during your training. What you posted above is just network configuration. There should be full network definition as well. And the height and the width tell you what's the network resolution. And it also keeps aspect ratio, check e.g this.

Nerxis
  • 3,452
  • 2
  • 23
  • 39
  • 3
    Little update/comment: I found there are other darknet based repos (like [this](https://github.com/AlexeyAB/darknet)) that doesn't keep aspect ratio. Check this: https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336955485 – Nerxis Aug 06 '18 at 14:36
  • What is the image is width and the height in your config file? Say, if wxh is 416x416 but your image is 200x200. Will it upsize the image or will it pad it with black? – blueether Jun 08 '21 at 04:26
  • 1
    It should be padded, but to be honest, it's been already few years when I was using darknet, so there might be updates I do not know about. And if you read the link in my previous comment, the behavior regarding resizing and so on might differ from repo to repo (as there are many darknet/yolo repositories). – Nerxis Jun 08 '21 at 07:00
  • @Nerxis, hello! Can you say please is my YOLOV3 model going to train herself good if resolutions of my images in dataset are different and i have .txt files with labels of objects on those images (for object detection)? – hyper-cookie Dec 10 '21 at 11:17
  • @hyper-cookie I cannot tell whether Yolo model will work in your case or not. What I can tell is that Yolo works even for cases where your dataset contains images of different sizes. So I would definitely encourage you to give it a try! (To be honest I experienced both with such datasets of different image sizes. It worked good for one case and worse for another one. It depends on the problem definition, quality of labels and images, whether data for training comes from the same distribution as the real samples, and so on.) – Nerxis Dec 10 '21 at 12:57
  • @Nerxis, thank you! – hyper-cookie Dec 23 '21 at 05:40
  • @Nerxis, hello! Can you help me please? Does Yolo change labels automatically too? Do i need to do with them anything? I mean i made them for photos with different resolutions, how are they gonna work? – hyper-cookie Feb 06 '22 at 11:11
  • 1
    @hyper-cookie you mean those box labels, right? In that case it's also done automatically, Darknet will resize both your image including box labels so you do not have to worry about. – Nerxis Feb 07 '22 at 08:09
  • @Nerxis, oh nice) Thank you! – hyper-cookie Feb 07 '22 at 10:23
12

You don't need to resize your database images. PJReddie's YOLO architecture does it by itself keeping the aspect ratio safe (no information will miss) according to the resolution in .cfg file. For Example, if you have image size 1248 x 936, YOLO will resize it to 416 x 312 and then pad the extra space with black bars to fit into 416 x 416 network.

Nouman Ahsan
  • 328
  • 4
  • 12
  • Does it always keeps the ratio or only when letterbox resizing is enabled? – s.paszko Feb 19 '22 at 15:30
  • @s.paszko ! I don't think the network itself has the capability to change the size. This is a pre-processing step, so yes, the letterbox method would do it before feeding into the network. – Nouman Ahsan Feb 23 '22 at 06:02
10

It is very common to resize images before training. 416x416 is slightly larger than common. Most imagenet models resize and square the images to 256x256 for example. So I would expect the same here. Trying to train on 6000x4000 is going to require a farm of GPUs. The standard process is to square the image to the largest dimension (height, or width), padding with 0's on the shorter side, then resizing using standard image resizing tools like PIL.

David Parks
  • 30,789
  • 47
  • 185
  • 328
  • How about doing a prediction on 6000x4000 image after training the data with the model. will that work ? – Profstyle Apr 03 '20 at 07:29
  • In general, not unless the model was trained for that resolution. The patterns the model needs to look for at each layer of the network in a higher resolution model are different than in a lower resolution model. – David Parks Apr 03 '20 at 16:06
  • so to detect a single image big as 6000x4000 do we need to set our cnfg file as big as this ? i m bit confused. I do not want to do a tiling, but also would get high accuracy – Profstyle Apr 05 '20 at 11:54
  • It's rare to work with images of this size, you probably can't fit that size on most GPUs and would need to break it into multiple pieces. The process isn't trivial and probably beyond the scope of a pre-trained model. You shouldn't lose much in accuracy in when resizing the image, you would only lose accuracy if you are working with very tiny features and bounding boxes, and then you would probably need to break up the image and process it in segments. If your boxes are a reasonable percentage of the image canvas size then resizing is the right approach. – David Parks Apr 05 '20 at 19:15
2

You do not need to resize the images, you can directly change the values in darknet.cfg file.

  1. When you open darknet.cfg (yolo-darknet.cfg) file, you can all
    hyper-parameters and their values.
  2. As showed in your cfg file images dimensions are (416,416)->(weight,height), you can change the values, so that darknet will automatically resize the images before training.
  3. Since the images have high dimensions, you can adjust batch and sub-division values (lower the values 32,16,8 . it has to be multiples of 2), so that darknet will not crash (memory allocation error)
Ed DeGagne
  • 3,250
  • 1
  • 30
  • 46
1

By default the darknet api changes the size of the images in both inference and training, but in theory any input size w, h = 32 x X where X belongs to a natural number should, W is the width, H the height. By default X = 13, so the input size is w, h = (416, 416). I use this rule with yolov3 in opencv, and it works better the bigger X is.

kascesar
  • 11
  • 2