20

I have a dataset of images that have multiple labels; There are 100 classes in the dataset, and each image has 1 to 5 labels associated with them.

I'm following the instruction in the following URL:

https://github.com/BVLC/caffe/issues/550

It says that I need to generate a text file listing the images and its labels as in

/home/my_test_dir/picture-foo.jpg 0
/home/my_test_dir/picture-foo1.jpg 1

In my case, since I have multi-label images, does it work to simply add labels as in following?

/home/my_test_dir/picture-foo.jpg 0 2 5
/home/my_test_dir/picture-foo1.jpg 1 4

I have a feeling that it's probably not going to be that simple, and if I'm right, in what step and how should I integrate the multi-label-ness of the dataset in the process of setting up Caffe?

Shai
  • 111,146
  • 38
  • 238
  • 371
ytrewq
  • 3,670
  • 9
  • 42
  • 71

3 Answers3

21

I believe Shai's answer is no longer up-to-date. Caffe supports multi-label/matrix ground truth for HDF5 and LMDB formats. The python snippet in this github comment demonstrates how to construct multi-label LMDB ground truth (see Shai's answer for HDF5 format). Different from the construction of single-label image datasets, an lmdb is constructed for the images while a second separate lmdb is constructed for the multi-label ground truth data. The snippet deals with spatial multi-label ground truth useful for pixel-wise labeling of images.

The order in which data is written to the lmdb is crucial. The order of the ground truth must match the order of the images.

Loss layers such as SOFTMAX_LOSS, EUCLIDEAN_LOSS, SIGMOID_CROSS_ENTROPY_LOSS also support multi-label data. However, the Accuracy layer is still limited to single-label data. You might want to follow this github issue to keep track of when this feature is added to Caffe.

Community
  • 1
  • 1
ypx
  • 1,459
  • 11
  • 19
  • 5
    Since I have 100 classes, and each image is labeled 1 to 5 classes from those 100, I would probably need a 1x100 matrix with entry 1 if the image has that class as its label and 0 otherwise, for example. Python snippet in your code deals with pixel-wise labeling, but what if you want multiple labels on each image as a whole? – ytrewq Sep 22 '15 at 10:12
  • Any update for the Accuracy layer? they closed the issue. – Rafael Ruiz Muñoz Apr 26 '18 at 11:06
5

caffe supports multilabel. You can put the labels into n-hot vectors e.g. [0,1,1,0,0,1,...] . You need to reshape the labels to n*k*1*1 tensors and use sigmoid cross-entropy or euclidean, not softmax (which forces sum(outputs)=1 )

jeremy_rutman
  • 3,552
  • 4
  • 28
  • 47
  • actually the recast to n*k*1*1 is unecesary, my bad. n*k is enough (label and net output should agree in dimension) – jeremy_rutman Jun 20 '16 at 21:10
  • I try this but when I try to create IMDB from data with caffe example script, in text file that I have address of each image and a vector as label, it cannot parse text file correctly so raise an error which cannot find or open files. any suggestion?? – Somayyeh Ataei May 21 '18 at 21:23
  • if you are giving relative paths , make sure they are relative to where caffe is running from - alternatively , give absolute paths . If you give some example error and lines from text file it may be easier to see what has happened. If you are using LMDB then all the data should be in those files and no text files will be necessary – jeremy_rutman May 22 '18 at 13:30
  • Thank you, but I try to replace label's vectors with a number and everything goes well so I am sure there is a problem because of vectors as labels – Somayyeh Ataei May 23 '18 at 12:15
  • did you recast the label dimensions to n*k ? (where n is batchsize and k is dimensions of vector) – jeremy_rutman May 23 '18 at 13:12
  • the dimension of the labels has to agree with the dimension that caffe is expecting. there is a lot of output coming from caffe concerning the sizes and dimensions of the objects being manipulated, make sure the dimension of label you are using matches the dimension caffe is expecting and if need be recast the dimensions to agree. – jeremy_rutman May 23 '18 at 19:55
  • Thank you, but as I said I stuck in convert data to LMDB phase and there is no caffe network yet. any way I convert data to HDF5 format. – Somayyeh Ataei May 24 '18 at 15:23
3

AFAIK, current Caffe version does not support lmdb/leveldb datasets for images with multilabels. However, you can (and probably should) prepare your inputs in HDF5 format. Caffe HDF5 input layer is much more flexible and will allow you to have multiple labels per input.
This answer gives a brief description of how to create HDF5 input for caffe.

Another issue you must address is the fact that you are interested not only in multi-label per image, but also with varying number of labels per image. How do you define your loss per image, per label? it might be the case that you would have to write your own loss layer.
There are some loss layers that supports "ignore label": that is, if a specific input label is assigned to the image, no loss is computed for the respective image. See, e.g. AccuracyLayer and SoftmaxWithLossLayer.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 2
    Possibly obsolete answer. [Caffe supports multi-label data](https://github.com/BVLC/caffe/issues/1698#issue-53768814) for multiple formats. Loss layers also support multi-label data. However, Accuracy is still limited to single-label data. – ypx Sep 21 '15 at 14:32
  • 1
    @ypx `convert_imageset` does not support floating-point labels. See [here](https://github.com/BVLC/caffe/blob/master/tools/convert_imageset.cpp#L76). – Shai Sep 21 '15 at 15:11
  • 1
    true it doesn't. Caffe supports loading float labels from lmdb, leveldb, hdf5 generated via python. No need for convert_imageset. – ypx Sep 21 '15 at 17:21
  • 2
    Just so there's a reference here, Evan Shelhamer put up [this tutorial](http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/pascal-multilabel-with-datalayer.ipynb) on multilabel outputs. Sure it's in python which is a pain if you've avoided pycaffe for the command line, but it at least gives the layer structure and solvers for an example – marcman Mar 11 '16 at 05:51