YOLO v3 complete architecture

Question

I am attempting to implement YOLO v3 in Tensorflow-Keras from scratch, with the aim of training my own model on a custom dataset. By that, I mean without using pretrained weights. I have gone through all three papers for YOLOv1, YOLOv2(YOLO9000) and YOLOv3, and find that although Darknet53 is used as a feature extractor for YOLOv3, I am unable to point out the complete architecture which extends after that - the "detection" layers talked about here. After a lot of reading on blog posts from Medium, kdnuggets and other similar sites, I ended up with a few significant questions:

Have I have missed the complete architecture of the detection layers (that extend after Darknet53 used for feature extraction) in YOLOv3 paper somewhere?
The author seems to use different image sizes at different stages of training. Does the network automatically do this upscaling/downscaling of images?
For preprocessing the images, is it really just enough to resize them and then normalize it (dividing by 255)?

Please be kind enough to point me in the right direction. I appreciate the help!

This is pretty broad and might get closed. You might have better luck on https://datascience.stackexchange.com/ — Stedy, Mar 28 '19 at 03:03
@Stedy Okay. I'll keep that in mind. I'm already posting it on https://ai.stackexchange.com . If you have any advice regarding the question as well, please do share. thanks! — hridayns, Mar 28 '19 at 03:06
That seems pretty reasonable. In the meantime it might be better to break this up into three separate questions — Stedy, Mar 28 '19 at 03:42
Yeah. I was thinking maybe each question was big enough to be answered by itself. Good point. — hridayns, Mar 28 '19 at 04:34
For number 1 I agree, there's no very detail and complete explanation about the architecture. Number 2, Yes the network automatically resize the image every 10 iterations, this is caused by `random = 1` param in cfg file. Number 3, what do you mean? You only need to provide the image and the corresponding bounding box — gameon67, Mar 28 '19 at 05:56
@gameon67 Thank you for your response. For number 3, I'm asking if that is all the preprocessing that needs to be done, before passing it as an input image to the network? — hridayns, Mar 28 '19 at 07:49
I don't know for Tf-keras implementation, but for normal Yolo you don't have to do that — gameon67, Mar 28 '19 at 07:55
@gameon67 Oh, so you are saying there is no need to resize the input image at all? — hridayns, Mar 28 '19 at 08:10
you don't. see this : https://stackoverflow.com/questions/49450829/darknet-yolo-image-size — gameon67, Mar 28 '19 at 08:13
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/190816/discussion-between-hridayns-and-gameon67). — hridayns, Mar 28 '19 at 08:20

YOLO v3 complete architecture

0 Answers0