I currently read YOLOv2 paper. And I couldn't understand why YOLO and YOLOv2 downsample the input by 32 in multi-scale training.
Can someone explain to me why the width and height is a multiple of 32?
I know that YOLO takes images of size 320×320, 352×352, … and 608×608 (with a step of 32) but this is not an understandable answer to me.
Tags