Is it possible to use arbitrary image sizes in caffe?

Question

I know that caffe has the so called spatial pyramid layer, which enables networks to use arbitrary image sizes. The problem I have is, that the network seems to refuse, to use arbitrary image sizes within a single batch. Do I miss something or is this the real problem?.

My train_val.prototxt:

name: "digits"
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/train_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "input"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "/Users/rvaldez/Documents/Datasets/Digits/SeperatedProviderV3_1020_batchnormalizedV2AndSPP/1/caffe/test_lmdb"
    batch_size: 10
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "pool1"
  top: "bn1"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "bn1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "spatial_pyramid_pooling"
  type: "SPP"
  bottom: "conv2"
  top: "pool2"
  spp_param {
    pyramid_height: 2
  }
} 
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: false
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "pool2"
  top: "bn2"
  batch_norm_param {
    use_global_stats: true
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  param {
    lr_mult: 0
  }
  include {
    phase: TEST
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "bn2"
  top: "ip1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

Link to another question regarding a subsequent problem.

i think it would be better if you resize them before training,an arbitrary will create confusing if you have pad and fiters defined,you can do that wilth opencv cv2.imresize(imgfile,(height,widht)) — Eliethesaiyan, Jul 19 '17 at 09:33
If I resize them, which benefit do I gain from using an spatial pyramid pooling layer? Does not make sense to me? — TruckerCat, Jul 19 '17 at 09:36

score 4 · Accepted Answer · answered Jul 19 '17 at 10:07

You are mixing several concepts here.

Can a net accept arbitrary input shapes?
Well, not all nets can work with any input shape. In many cases a net is restricted to the input shape for which it was trained.
In most cases, when using fully-connected layers ("InnerProduct"), these layers expects an exact input dimension, thus changing the input shape "breaks" these layers and restrict the net to a specific, pre-defined input shape.
On the other hand "fully convolutional nets" are more flexible with regard to input shape and can usually process any input shape.

Can one change input shape during batch training?
Even if your net architecture allows for arbitrary input shape, you cannot use whatever shape you want during batch training because the input shape of all samples in a single batch must be the same: How can you concatenate a 27x27 image with another of shape 17x17?

It seems like the error you are getting is from the "Data" layer that is struggling with concatenating samples of different shapes into a single batch.

You can resolve this issue by setting batch_size: 1 processing one sample at a time and set iter_size: 32 in your solver.prototxt to average the gradients over 32 samples getting the SGD effect of batch_size: 32.

Thank you for this clear answer, I did as you suggested and it solved the problem. I also ran into a subsequent problem, you may look into my edited question? Would be great! - I will mark your answer as the correct one anyway :-) — TruckerCat, Jul 19 '17 at 10:30
@R_Valdez if you have a new question, you should *ask* a new question (consider link them for context). — Shai, Jul 19 '17 at 10:32
I asked another question. You can find the link in my edited question. — TruckerCat, Jul 19 '17 at 10:43

Is it possible to use arbitrary image sizes in caffe?

1 Answers1

Linked