1

Recently, I trained yolov3 with transfer learning method.

I used the following command to train my yolov3 weight.

./darknet detector train cfg/coco.data cfg/yolov3.cfg darknet53.conv.74  -gpus 0,1,2,3 -map | tee -a yolov3-official-transfer-learning.log

After submitting 500200 batches weight to CodaLab to test the performance on COCO Dataset,

I got the following result:

AP: 0.321
AP_50: 0.541
AP_75: 0.339
AP_small: 0.143
AP_medium: 0.332
AP_large: 0.450
AR_max_1: 0.284
AR_max_10: 0.434
AR_max_100: 0.454
AR_small: 0.257
AR_medium: 0.473
AR_large: 0.617

Comparing to the official weight on CodaLab

AP: 0.315
AP_50: 0.560
AP_75: 0.324
AP_small: 0.153
AP_medium: 0.334
AP_large: 0.430
AR_max_1: 0.278
AR_max_10: 0.433
AR_max_100: 0.456
AR_small: 0.267
AR_medium: 0.484
AR_large: 0.610

We can clearly see that AP_50 in official weight is 1.9% higher than my self-trained version.

By the way,

[1] I used AlexeyAB/darknet, not pjreddie/darknet/ to train YOLOv3.

[2] I Used COCO2014 as my training dataset.

Does anyone know how to explain this situation? Or is it possible to reproduce the official result?

Yanwei Liu
  • 55
  • 9

2 Answers2

0

mAP is more persuasive in evaluation, which is the AP metric in your question.

In fact, your mAP(0.321) is slightly higher than that of official verison(0.315).

AP_50 is just the AP when IOU threshold set to 0.5, while mAP take into account several AP when IOU threshold set to 0.5:0.95:0.05(total 10 IOU) respectively.

Your AP_50 is lower. Buy you may notice that your AP_75 is higher. That explain why your mAP is slightly higher than official version.

You can refer to coco official site to find more evaluation metric info.

leedewdew
  • 1
  • 1
0

Did you use the default cfg? If so, you probably trained at a lower resolution and/or with smaller mini-batches than the authors which would mean more stochastic training -> lower AP.

There's also a level of randomness with training DNNs, I've seen networks train to slightly different APs with identical configurations. It's likely that the authors of Yolov3 ran many training trials and chose the very best result for publication so it's likely that on average, an exact imitation of their training would produce slightly worse results.

ezekiel
  • 427
  • 5
  • 20
  • Yes, I used default cfg but with input size of 416x416. batch = 64 and subdivisions = 16. Maybe the randomness while training is the reason why it's slightly different from the official result. – Yanwei Liu Jul 13 '20 at 00:33
  • So you used a minibatch size of 64/16 = 4 which is quite small and surely smaller than the authors. It is a little surprising that in spite of that, you actually got a slightly higher AP75. – ezekiel Jul 13 '20 at 09:50
  • But when I take a look at [yolov3.cfg](https://github.com/AlexeyAB/darknet/blob/6d44529cf93211c319813c90e0c1adb34426abe5/cfg/yolov3.cfg), the author used batch=64 and subdivisions=16. Is something wrong with my understanding? – Yanwei Liu Jul 13 '20 at 12:54
  • batch=64, subdivisions=16 means a mini-batch size of 4. [the paper](https://pjreddie.com/media/files/papers/YOLOv3.pdf) is pretty sparse on specifics, but I'd be surprised if that was the size they used for their final training, minibatch sizes of 256 or greater are not uncommon for new SOTA algorithms. [this answer](https://stackoverflow.com/questions/58355388/batch-and-subdivisions-in-yolov3) has a bit more detail on minibatch size. – ezekiel Jul 13 '20 at 18:56