How to feed multi label data as HDF5 input in a multi task setup?

Question

I have a dataset with each image having around 101 labels. I know, I have to use HDF5 data layer to feed my data into the network. But the problem is that I have a multi task setup. My network has shared parameters for the first 5 layers and then branches off. Out of 101 labels, I want to send 100 labels to one task and 1 label to the second task.

Now, How do I do this ? Can I somehow do the following :

layer {
      name: "data"
      type: "HDF5Data"
      top: "data"
      top: "label1"           ############# A scalar label
      top :  "label2"          ######## A vector of size 100
      include {
        phase: TRAIN
      }
      hdf5_data_param {
      source: "path/to/the/text/file/test.txt"
      batch_size: 10
      }
}

There are two top blobs in the above setup. One for the 100 dimensional vector (label2) and the other for the remaining label (label1).
IS THIS KIND OF A SETUP POSSIBLE ?

I also read somewhere that one can split the multi dimensional vector specifying the split specifications in the prototxt file itself. In that case I would have to use a single top blob for label (101 dimensional) and then somehow split the 101-d vector in two vectors of 100-d and 1-d (scalar). How can this be done?
The layer in that case would like :

layer {
      name: "data"
      type: "HDF5Data"
      top: "data"
      top :  "label"          ######## A vector of size 101
      include {
        phase: TRAIN
      }
      hdf5_data_param {
      source: "path/to/the/text/file/test.txt"
      batch_size: 10
      }
}
## Some layer to split the label blob into two vectors of 100-d and 1-d respectively

Any Idea of how this split may work ?

Yes, it is possible. Have a look at this: http://stackoverflow.com/questions/33140000/how-to-feed-caffe-multi-label-data-in-hdf5-format[1] — Lemm Ras, Aug 09 '16 at 17:12

score 2 · Accepted Answer · answered Aug 10 '16 at 06:27

The original settings you proposed ("HDF5Data" layers with three tops) is possible and perfectly Okay in caffe. In fact, caffe supports any directional a-cyclic flow of data in the graph formed by the net. You can have several bottoms and multiple loss layers. It's Okay.

If you insist on having a single label input of 101 dimensions, you can split it using "Slice" layer

layer {
  type: "Slice"
  name: "slice/label"
  bottom: "label" # assuming shape batch_size-101-1-1
  top: "label1"   # first 1D label
  top: "label2"   # second 100D label
  slice_param {
    axis: 1  # along "channels" dimension
    slice_point: 1 # slice after the first element
  }
}

For more information about the "Slice" layer params, you can see caffe.proto.

@Shai I have a question with something somewhat related, on preprocessing hdf5 data, can you look into it? https://stackoverflow.com/questions/47799416/hdf5data-processing-with-caffes-transformer-for-training Thanks — dusa, Dec 13 '17 at 17:58

How to feed multi label data as HDF5 input in a multi task setup?

1 Answers1