8

I'm following tensorflow object detection api instructions and trying to train existing object-detection model("faster_rcnn_resnet101_coco") with my own dataset having 50 classes.

So according to my own dataset, I created

  1. TFRecord (FOR training,evaluation and testing separately)
  2. labelmap.pbtxt

Next, I edited model.config only for model-faster_rcnn-num_classes(90 -> 50(the number of classes of my own dataset), train_config-batch_size(1 -> 10), train_config-num_steps(200000 -> 100), train_input_reader-tf_record_input_reader-input_path(to the path where TFRecord resides) and train_input_reader-label_map_path(to the path where labelmap.pbtxt resides).

Finally, I run the command

python train.py \
--logtostderr \
--pipeline_config_path="PATH WHERE CONFIG FILE RESIDES" \
--train_dir="PATH WHERE MODEL DIRECTORY RESIDES"

And I met the error below:

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [1,890,600,3] vs. shape[1] = [1,766,600,3] [[Node: concat_1 = ConcatV2[N=10, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](Preprocessor/sub, Preprocessor_1/sub, Preprocessor_2/sub, Preprocessor_3/sub, Preprocessor_4/sub, Preprocessor_5/sub, Preprocessor_6/sub, Preprocessor_7/sub, Preprocessor_8/sub, Preprocessor_9/sub, concat_1/axis)]]

It seems like the dimension of input images so it may be caused by not resizing the raw image data.

But As I know, model automatically resizes the input image to train (isn't it?)

Then I'm stuck with this issue.

If there is solution, I'll appreciate it for your answer. Thanks.

UPDATE

When I updated my batch_size field from 10 to one(original one), it seems to train without any problem... but I don't understand why...

LKM
  • 2,410
  • 9
  • 28
  • 53
  • 1
    see the config file sin that repo, the batch size is 1 according to the faster rcnn paper. Bigger batch size will consume too much memory. – Jie.Zhou Sep 08 '17 at 06:09
  • @Jie.Zhou Here is my "model.config" file : https://pastebin.com/4An9HsPK as I stated above, a few things have been changed – LKM Sep 08 '17 at 06:38
  • I think the code is probably written for a single one image as input, so if you change the batch size to int bigger than one, the error will be raised for some internal mistake – Jie.Zhou Sep 08 '17 at 06:44
  • do you mean that "the code" from tensorflow, not from myself is written for a single one image because the paper of Faster-R-CNN processses the batch as a single one image? – LKM Sep 08 '17 at 06:53
  • that's exactly what I mean – Jie.Zhou Sep 08 '17 at 06:56

3 Answers3

14

TaeWoo is right, you have to set batch_size to 1 in order to train Faster RCNN.

This is because FRCNN uses a keep_aspect_ratio_resizer, which in turn means that if you have images of different sizes, they will also be different sizes after the preprocessing. This practically makes batching impossible, since a batch tensor has a shape [num_batch, height, width, channels]. You can see this is a problem when (height, width) differ from one example to the next.

This is in contrast to the SSD model, which uses a "normal" resizer, i.e. regardless of the input image, all preprocessed examples will end-up having the same size, which allows them to be batched together.

Now, if you have images of different sizes, you practically have two ways of using batching:

  • use Faster RCNN and pad your images before, either one time before training, or continuously as a preprocessing step. I'd suggest the former, since this type of preprocessing seems to slow down learning a lot
  • use SSD, but be sure that your objects are not affected too much by distortion. This shouldn't be a very big problem, it works as a way of data augmentation.
Ciprian Tomoiagă
  • 3,773
  • 4
  • 41
  • 65
4

I had the same problem. Setting batch_size=1 does indeed seem to solve this problem but i am not sure if this will have any effect on accuracy of the model. Would love to get TF team's answer to this.

TaeWoo
  • 461
  • 3
  • 15
0

I had a similar problem that I want to share, maybe it would others with similar situations. I've changed SSD OD net to find bboxes with a fifth variable which is an angle. The problem was that we inserted an empty list to the angle variable in the bounding box and then I had a problem in tf.concat operation :

Dimensions of inputs should match: shape[0] = [1,43] vs. shape[4] = [1,0]

(shape[0] changed if I rerun the session but shape[4] stayed the same [1,0])

I fixed the problem by fixing my tf record to have a list of angles in the same lenth of other bbox variables (xmin, xmax, ymin, ymax).

Hope it helps someone , it took me a whole day to find out the problem.

Regards, Alon

Alon Samuel
  • 356
  • 2
  • 8