What does generate_anchor_base()'s arguments mean?

Question

Looking generate_anchor_base method, which is Faster R-CNN util method in ChainerCV.

What is the base_size = 16? I saw in the Documentation that it is

The width and the height of the reference window.

But what does "reference window" mean?

Also it says that anchor_scales=[8, 16, 32] are the areas of the anchors but I thought that that the areas are (128, 256, 512)

Another question:
If the base size is 16 and h = 128 and w=128, Does that mean anchor_base[index, 0] = py - h / 2 is a negative value? since py = 8 and and h/2 = 128/2

score 1 · Accepted Answer · answered Nov 18 '18 at 13:39

The method is a util function of Faster R-CNN, so I assume you understood what is the "anchor" proposed in Faster R-CNN.

"Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" https://arxiv.org/abs/1506.01497

base_size and anchor_scales determines the size of the anchor. For example, when base_size=16 and anchor_scales=[8, 16, 32] (and ratio=1.0), height and width of the anchor will be 16 * [8, 16, 32] = (128, 256, 512), as you expected. ratio determines the height and width aspect ratio.

(I might be wrong in below paragraph, please correct if I'm wrong.)

I think base_size need to be set as the size of the current hidden layer's scale. In the chainercv Faster R-CNN implementation, extractor's feature is fed into rpn (region proposal network) and generate_anchor_base is used in rpn. So you need to take care what is the feature of extractor's output. chainercv uses VGG16 as the feature extractor, and conv5_3 layer is used as extracted feature (see here), this layer is a place where max_pooling_2d is applied 4 times, which results 2^4=16 times smallen feature.

For the another question, I think your understanding is correct, py - h / 2 will be negative value. But this anchor_base value is just a relative value. Once anchor_base is prepared at the initialization of model (here), actual (absolute value) anchor is created in each forward call (here) in _enumerate_shifted_anchor method.

Yes, thank you. 16 is the receptive field(in the original image) for every spatial location in "conv5_3". But I still can't get it, Why do we need to create variable "base_size"? We could just say anchor_scales=[128, 256, 512], right? What does the operation mean "base size * anchor_scales"? What does "reference window" mean? So sorry for all those questions :( — floyd, Nov 18 '18 at 14:04

What does generate_anchor_base()'s arguments mean?

1 Answers1