Caffe - multi-class and multi-label image classification

Question

I'm trying to create a single multi-class and multi-label net configuration in caffe.

Let's say classification of dogs: Is the dog small or large? (class) What color is it? (class) is it have a collar? (label)

Is this thing possible using caffe? What is the proper way to do so? What is the right way to build the lmdb file?

All the publications about multi-label classification are from around 2015, something in this subject changed since then?

Thanks.

score 0 · Accepted Answer · answered Oct 21 '18 at 06:31

The problem with Caffe's LMDB interface is that it only allows for a single int label per image.
If you want multiple labels per image you'll have to use a different input layer.
I suggest using "HDF5Data" layer:
This allows for more flexibility setting the input data, you may have as many "top"s as you want for this layer. You may have multiple labels per input image and have multiple losses for your net to train on.

See this post on how to create hdf5 data for caffe.

score 0 · Answer 2 · edited Oct 29 '18 at 06:20

Thanks Shai,

Just trying to understand the practical way.. After creating 2 .text files (one for training and one for validation) containing all the tags of the images, for example:

/train/img/1.png 0 4 18
/train/img/2.png 1 7 17 33
/train/img/3.png 0 4 17

Running the py script:

import h5py, os
import caffe
import numpy as np

SIZE = 227 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' ) 
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
    H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
    H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
    L.write( 'train.h5' ) # list all h5 files you are going to use

And creating train.h5 and val.h5 (is X data set containing the images and Y contain the labels?).

Replace my network input layers from:

layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/train_db" 
   backend: LMDB 
   batch_size: 64 
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TRAIN } 
} 
layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/val_db"  
   backend: LMDB 
   batch_size: 64
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TEST } 
}

to

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TRAIN }
}

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "val_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TEST }
}

I guess HDF5 doesn't need a mean.binaryproto?

Next, how the output layer should change in order to output multiple label probabilities? I guess I need cross- entropy layer instead of softmax? This is the current output layers:

layers {
  bottom: "prob"
  bottom: "label"
  top: "loss"
  name: "loss"
  type: SOFTMAX_LOSS
  loss_weight: 1
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}

you do not need `mea.binaryproto` in your prototxt, but if you want to subtrat mean (and you do) you need to do it when you create the hdf5 files. — Shai, Oct 29 '18 at 06:23
you only save one of the labels in your text files (the first one) and ignore all the rest. — Shai, Oct 29 '18 at 06:23
It's a follow-up question since i'm still trying to understand your original answer. I didn't understand your last answer, i'm trying to classify multilabel network, hence each image have several labels related. What do you mean by "ignore all the rest"? — Gal Dalali, Oct 29 '18 at 13:32
then posting it as an answer is not exactly a good idea: I have no way of knowing that you posted it, nor do I understand what you are asking here. If you have questions, post them as such (you can link one question to the other for context). — Shai, Oct 29 '18 at 13:38
I opened up a new question - https://stackoverflow.com/questions/53047003/multi-class-and-multi-label-image-classification-using-caffe — Gal Dalali, Oct 29 '18 at 13:54

Caffe - multi-class and multi-label image classification

2 Answers2