4

I am trying out Google's deepdream code which makes use of Caffe. They use the GoogLeNet model pre-trained on ImageNet, as provided by the ModelZoo. That means the network was trained on images cropped to the size 224x224 pixel. From the train_val.prototext:

layer {            
  name: "data"     
  type: "Data"     
  ...

  transform_param {
     mirror: true   
     crop_size: 224
  ... 

The deploy.prototext used for processing also defines an input layer with the size of 224x224x3x10 (RGB images of size 224x224, batchsize 10).

name: "GoogleNet"
input: "data"
input_shape {
  dim: 10
  dim: 3
  dim: 224
  dim: 224
}

However I can use this net to process images of any size (the example above used one of 1024x574 pixel).

  1. deploy.prototext does not configure caffe to use cropping.
  2. The preprocessing in the deepdream code only does demeaning, also no cropping here

How is it possible that I can run on images which are too big for the input layer?


complete code can be found here

Shai
  • 111,146
  • 38
  • 238
  • 371
Zakum
  • 2,157
  • 2
  • 22
  • 30

1 Answers1

4

DeepDream does not crop the input image at all.
If you pay close attention you'll notice that it operates on a mid-level layer: it's end= argument is set to 'inception_4c/output' or end='inception_3b/5x5_reduce', but NEVER end='loss3/classifier'. The reason for this is that the GoogLeNet up to these layers is a fully-convolutional net, that is, it can take any sized input image and produce outputs of sizes proportional to the input size (output size is usually affected by conv padding and pooling).

To adjust the net to the different sizes of inputs the function deepdream has the line

src.reshape(1,3,h,w) # resize the network's input image size

This line adjusts the net's layers to accommodate input of shape (1,3,h,w).

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Ahh, that makes a lot of sense of course! Does this actually mean that if I remove all fully-connected layers I could also *train* the resulting network-architecture on arbitrarily sized images? Providing I get a clever batching implemented? – Zakum Jan 10 '16 at 16:40
  • @Zakum How would you relate a single label loss to an arbitrary sized prediction? – Shai Jan 10 '16 at 19:37
  • Well I'm just starting to plough my way through this as you might have noticed. ^^ Wouldn't a Softmax layer at the end do the job? – Zakum Jan 10 '16 at 23:42
  • @Zakum "Softmax" will do the trick for loss, but how would you train a classifier with fixed sized output? – Shai Jan 11 '16 at 06:27
  • I replace the InnerProduct layers with Convolutional layers and set the output_size accordingly. In fact I am planning to try out some Regression with the GoogLeNet architecture, so I basically would set the output size to 1. – Zakum Jan 12 '16 at 17:05
  • @Zakum I'm not familiar with `output_size` parameter. How exactly are you going to set it? are you going to use `"Pool"` layer with `global_pooling` parameter? – Shai Jan 12 '16 at 17:07
  • Beg pardon, the actual name is `num_output` and its a parameter of the `Convolution` Layer (https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L513) – Zakum Jan 12 '16 at 22:19
  • @Zakum `num_output` (for `"Convolution"` or `"InnerProduct"`) determines the output feature dimension. However, in a convolutional layer, your output size might be bigger because of it's spatial dimensions. How are you going to output a **single** prediction from spatially varying input size? – Shai Jan 13 '16 at 06:15
  • Darn, that makes a lot of sense again. Will have to mull this over! – Zakum Jan 13 '16 at 14:29
  • 3
    For the sake of documentation: `global_pooling=1` in a Pooling layer followed by a Convolutional layer with `num_output=1` is what I ended up with for regression. Caffe also supports arbitrary sized inputs during training out of the box. One just needs to either use batches of same sized input or just bite the bullet and use a batch size of 1. – Zakum Feb 10 '16 at 02:39
  • @Zakum were you able to train with batch size of one? I suppose it results with very noisy gradients, isn't it? – Shai Feb 10 '16 at 05:56
  • The model is able to converge also with batch size one -- although I do see some serious oscillating in my loss function along the way. – Zakum Feb 10 '16 at 17:54