0

I need to write python script that prepare data to feed it to a caffe solver. My input is images(X) and vector of ints(Y) (I have multioutput regression problem not single Y for each X) and I try to modify Lenet to my task.

Here I found that hdf5 can be a good option - it can be used from python, but drawbacks is that we can't do data augmentation on-the-fly and input images must be float32/float64.

Also here I found an example, but in example there is only 1D data, so I'm curious what shape images should have?

Also here I found info about EUCLIDEAN_LOSS and HINGE_LOSS layers. What layer type should I use for multioutput regression?

Community
  • 1
  • 1
mrgloom
  • 20,061
  • 36
  • 171
  • 301
  • please ask a **single** question at each post: one for data, and maybe a new one for what loss to use. – Shai Feb 10 '16 at 06:33
  • 1
    BTW using enum (e.g., `EUCLIDEAN_LOSS`, `HINGE_LOSS`) for layer type indicates old caffe version. Newer versions work with strings: `type: "EuclideanLoss"`, or `type: "HIngeLoss"`. Make sure your caffe branch is up to date. – Shai Feb 10 '16 at 06:35
  • 1
    How python-ic are you? you can write a `type: "Python"` layer as an input layer to do the augmentations you want on the fly. – Shai Feb 10 '16 at 06:37

1 Answers1

1

Caffe expects its input images to be 4-D B-by-C-by-H-by-W:

  • B is the "batch size", the number of images you process simultaneously
  • C is the number of channels, usually 3 for BGR (most nets conforms to opencv BGR format, rather than RGB. go figure...)
  • H and W are the height and width of the image, respectively.

Therefore, you need a python script that read images (you can use caffe.io.load_image) then transpose, resize, rescale and finally stack them into B-by-C-by-H-by-W numpy arrays of type float32.
You can do the augmentations in python and save all the data into hdf5 files.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • I don't understand, I need to feed data to caffe batch by batch manually or need to save it in hdf5 as `N`-`C`-`H`-`W` where `N` is number of samples/examples. – mrgloom Feb 10 '16 at 09:03
  • 1
    @mrgloom the data should be 4D in your HDF5 file. Each sample is of size `C`-`H`-`W` and you have `n1` of them stored in the file. When training, caffe reads only `B` of them at a time according to the `batch_size` you define. Thus, you can store many samples (much more than a single batch) in each `HDF5` file. However, the rest of the dimensions (`C`, `H` and `W`) should remain fixed throughout. – Shai Feb 10 '16 at 09:06