0

I have,say, n images and for each of them I have additional 2 artificial(made-up) features, and image-labels are single dimensional integers.

I want to fine-tune Image-net on my dataset, but I do not know how to handle these 2 additional features as input, how should I feed the data to caffe? Please help!

EDIT: The 2 features can be any 2 numbers (1 dimensional) say two numbers representing what class an image falls into, and how many images fall into that class.

Say, I have 'cat.jpg', then the features are say, 5 and 2000, where 5 is the feature 1 representing the class and 2000 is the total images in that class. In short, the 2 features can be any two integers.

Shai
  • 111,146
  • 38
  • 238
  • 371
Deven
  • 617
  • 2
  • 6
  • 20

1 Answers1

1

I think the most straight forward way for you is to use "HDF5Data" input layer, where you can store both the input images, the additional two "features" and the expected output value (for regression).

You can see an example here for creating HDF5 data in python. A Matlab example can be found here.

Your HDF5 should have 4 "datasets": one is the input images (or the image descriptors of dim 4096). n dimensional array of images/descriptors.
Another dataset is "feat_1" an n by 1 array, and "feat_2" and n by 1 array.
Finally you should have another input "target" an n by 1 array of the expected output you wish to learn.

Once you have an HDF5 file ready with these datasets in it, you should have

layer {
  type: "HDF5Data"
  top: "data" # name of dataset with images/imagenet features
  top: "feat_1"
  top: "feat_2"
  top: "target"
  hdf5_data_param {
    source: "/path/to/list/file.txt"
  }
}

As you can see a single "HDF5Data" layer can produce several "top"s.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • I read somewhere that I would have to use the Concatenation Layer, isn't that the case? – Deven Jan 03 '16 at 19:13
  • @Deven Yes, if you want to feed imagenet feature, feat_1 and feat_2 to the same InnerProduct layer, you will need to use Concatanation layer. – Shai Jan 03 '16 at 19:22
  • I am a bit out of my depth here, I do not know whether I should feed them to the same InnerProduct layer or not. Please explain. – Deven Jan 03 '16 at 19:27
  • @Deven I can't make this call for you. Eventually, in order to produce a single output you should merge all your inputs into a single classification/regression layer (usually an InnerProduct layer). However, you might want to pass your inputs through one or more hidden layers before you make your prediction, though... – Shai Jan 03 '16 at 19:29
  • To give you some more info, I am just modifying the flickr style finetuning example, and making small modifications(like Euclidean loss for regression). What I will be doing now is modifying the data layer(as told by you) and just adding a concatenation layer(before the IP layer), would I be doing anything wrong here? – Deven Jan 03 '16 at 19:38
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/99616/discussion-between-deven-and-shai). – Deven Jan 03 '16 at 19:41
  • From the matlab example you linked, I have understood that if X is a n dimensional array of images, then I would use: hdf5write('my_data.h5', '/X', single( permute(X,[4:-1:1]) ) ) Am I right in doing so? – Deven Jan 09 '16 at 22:10
  • @Deven yes. you can use the command line util `h5ls` to make sure that the shape of your inputs is correct – Shai Jan 10 '16 at 05:48