12

I am doing regression using caffe, and my test.txt and train.txt files are like this:

/home/foo/caffe/data/finetune/flickr/3860781056.jpg 2.0  
/home/foo/caffe/data/finetune/flickr/4559004485.jpg 3.6  
/home/foo/caffe/data/finetune/flickr/3208038920.jpg 3.2  
/home/foo/caffe/data/finetune/flickr/6170430622.jpg 4.0  
/home/foo/caffe/data/finetune/flickr/7508671542.jpg 2.7272

My problem is it seems caffe does not allow float labels like 2.0, when I use float labels while reading, for example the 'test.txt' file caffe only recognizes

a total of 1 images

which is wrong.

But when I for example change the 2.0 to 2 in the file and the following lines same, caffe now gives

a total of 2 images

implying that the float labels are responsible for the problem.

Can anyone help me here, to solve this problem, I definitely need to use float labels for regression, so does anyone know about a work around or solution for this? Thanks in advance.

EDIT For anyone facing a similar issue use caffe to train Lenet with CSV data might be of help. Thanks to @Shai.

Community
  • 1
  • 1
Deven
  • 617
  • 2
  • 6
  • 20

4 Answers4

24

When using the image dataset input layer (with either lmdb or leveldb backend) caffe only supports one integer label per input image.

If you want to do regression, and use floating point labels, you should try and use the HDF5 data layer. See for example this question.

In python you can use h5py package to create hdf5 files.

import h5py, os
import caffe
import numpy as np

SIZE = 224 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' ) 
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    # transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
    H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
    H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
    L.write( 'train.h5' ) # list all h5 files you are going to use

Once you have all h5 files and the corresponding test files listing them you can add an HDF5 input layer to your train_val.prototxt:

 layer {
   type: "HDF5Data"
   top: "X" # same name as given in create_dataset!
   top: "y"
   hdf5_data_param {
     source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
     batch_size: 32
   }
   include { phase:TRAIN }
 }

Clarification:
When I say "caffe only supports one integer label per input image" I do not mean that the leveldb/lmdb containers are limited, I meant the tools of caffe, specifically the convert_imageset tool.
At closer inspection, it seems like caffe stores data of type Datum in leveldb/lmdb and the "label" property of this type is defined as integer (see caffe.proto) thus when using caffe interface to leveldb/lmdb you are restricted to a single int32 label per image.

Przemek D
  • 654
  • 6
  • 26
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 1
    Thanks @Shai for this elaborate answer. I will try this and report back. Thnks again. – Deven Aug 05 '15 at 14:17
  • @Deven thanks for the upvote. Regarding "accepting" the answer - you better check that it works for you before "accepting"... don't you think? – Shai Aug 05 '15 at 14:21
  • Actually... I had put this question on the caffe users forum and their github issue forum also (which I shouldn't have as that is for development only) and I got the same answer there ... so I am pretty sure it would work ...anyway you are right about not accepting it yet. – Deven Aug 05 '15 at 14:30
  • 1
    @Deven please link these questions, so people looking at each source can find all the relevant answers quickly and efficiently. – Shai Aug 05 '15 at 14:32
  • I have linked that question in the edit, I am sort of new here, so it might not be what you had in mind. Please point out if something else needs to be done. – Deven Aug 05 '15 at 14:43
  • Hi @Shai now I want to know is there any easy way of creating hdf5 files from matlab only? – Deven Aug 09 '15 at 09:28
  • @Deven should be quite easy. Matlab has full support of hdf5. See [here](http://www.mathworks.com/help/matlab/high-level-functions.html) for more details. – Shai Aug 09 '15 at 09:36
  • @Deven you should be careful, though: python and caffe store marices in a row-major fashion, while Matlas is column-major. you might need to "transpose" your arrays in Matlab. – Shai Aug 09 '15 at 09:37
  • 1
    Shouldn't the line: y = np.zeros( (1,len(lines)), dtype='f4' ) be y = np.zeros( (len(lines)), dtype='f4' ) ? – unicorn_poet Nov 12 '15 at 15:36
  • @angela I'm not sure, I think both options are valid. – Shai Nov 12 '15 at 15:39
  • I was saying that because you index the y vector later as so: y[i] = float(sp[1]). This gives you an exception because the first dimension is 1, and i may be different from 1. Did you mean y[1, i] = float(sp[1])? – unicorn_poet Nov 12 '15 at 15:41
  • I am asking this because I'm having trouble with the HDF5 data layer, and I'm hoping this is my issue :) – unicorn_poet Nov 12 '15 at 15:41
  • @angela `y[1,i]` is an error because the first entry is `y[0,i]` - python index starts with 0 not 1. If you have a specific problem please ask a new question. It is difficult to guess your problem from your comments and help you solve it. – Shai Nov 12 '15 at 15:46
  • Yep, I meant `y[0,i]`. What I was asking is whether you intentionally gave the labels array 2 dimensions (1, len(lines)) as you did here: `y = np.zeros( (1,len(lines)), dtype='f4' )`, or whether it was a mistake since you later index that same `y` vector with a single index `i` – unicorn_poet Nov 12 '15 at 15:51
  • @angela I did it on purpose(I'm used to Matlab...) but I'm not certain this is crucial – Shai Nov 12 '15 at 16:26
  • You get an exception if you do it the way you say in your answer, since you cannot index `y[i]` if `i>0`. In any case, I posted my actual question here, in case you want to have a look: http://stackoverflow.com/questions/33676656/caffe-hdf5-not-learning – unicorn_poet Nov 12 '15 at 17:27
  • How should I specify the shape of `y`? – Meta Fan Sep 22 '16 at 12:58
  • @GuWang the shape of `y` is `number of samples`-by-`dimension of y` – Shai Sep 22 '16 at 13:02
  • @Shai I mean should I provide just the true label like `0` or `1`, or the `one hot coding of y`? – Meta Fan Sep 22 '16 at 13:05
  • @GuWang if you are doing classification with `"SoftmaxWithLoss"` layer, than `y` can be a scalar per sample image. If you use other loss layers than you might need to change the way you feed `y` to caffe. – Shai Sep 22 '16 at 13:13
  • @Shai: in the case of `img = caffe.io.resize( img, (SIZE, SIZE, 3) )`, where do you add the H and W values? I don't have a square image and I'd like to use this functionality to verify if my own function works properly. – Cassie Jan 12 '17 at 11:03
  • @Cassie `img = caffe.io.resize( img, (H, W, 3) )` see [`io.py`](https://github.com/BVLC/caffe/blob/master/python/caffe/io.py#L306-L319). – Shai Jan 12 '17 at 11:09
  • Can we use `ImageData` layer for regression? as you said we can not use multiple label for `lmdb` what about ImageData? – Saeed Masoomi Feb 21 '18 at 17:56
  • @saeedmasoomi as far as I know ImageData layer also supports only single integer label per input image. Why not using HDFtData layer? – Shai Feb 21 '18 at 19:15
  • @Shai could you share code about how to handle pixel level labels? Let's say in this case your data=X is the image of size h x w x 3 (three channels) and your label (data=Y) is of the size h x w x 1 – simplename May 31 '18 at 07:01
  • 1
    @simplename it's the same code as in this answer, only `y[i]` should be 2D (in fact, 3D with channel dim=1) instead of a float scalar. – Shai May 31 '18 at 07:12
  • @Shai I see - how does caffe generalize where in the case of classification the label is 1D where in the case of semantic segmentation the label is 3D. How does caffe know how to do the right thing with the labels in both cases? – simplename May 31 '18 at 07:21
  • @Shai moreover, lets say in another case, the each label is represented by a color so lets say the label (data=Y) is also h x w x 3. How again does caffe understand this and do the right training? – simplename May 31 '18 at 07:24
  • @simplename caffe does not "understand" anything, it just process whatever inputs you feed it. If your hdf5 files contains `X` as 3D and `Y` as 3D caffe will process them accordingly. During "forward" caffe loads `x` and `y` from file (in `"HDF5Data"` layer) and reshape the rest of the net according to the shapes of `x` and `y` it read. The rest of the processing/"understanding" is up to you and the net you design. – Shai May 31 '18 at 07:47
3

Shai's answer already covers saving float labels to HDF5 format. In case LMDB is required/preferred, here's a snippet on how to create an LMDB from float data (adapted from this github comment):

import lmdb
import caffe
def scalars_to_lmdb(scalars, path_dst):

    db = lmdb.open(path_dst, map_size=int(1e12))

    with db.begin(write=True) as in_txn:    
        for idx, x in enumerate(scalars):            
            content_field = np.array([x])
            # get shape (1,1,1)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = np.expand_dims(content_field, axis=0)
            content_field = content_field.astype(float)

            dat = caffe.io.array_to_datum(content_field)
            in_txn.put('{:0>10d}'.format(idx) dat.SerializeToString())
    db.close()
Community
  • 1
  • 1
ypx
  • 1,459
  • 11
  • 19
  • I'm afraid using `caffe.io.array_to_datum` is problematic, as `label` field in `datum` is [defined as integer](https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L36) – Shai May 02 '16 at 05:01
  • 1
    @Shai, true, the ground truth is saved to the data field of the datum. This requires generating separate lmdb for the input and ground truth respectively. – ypx May 02 '16 at 08:36
2

I ended up transposing, switching the channel order, and using unsigned ints rather than floats to get results. I suggest reading an image back from your HDF5 file to make sure it displays correctly.

First read the image as unsigned ints:

img = np.array(Image.open('images/' + image_name))

Then change the channel order from RGB to BGR:

img = img[:, :, ::-1]

Finally, switch from Height x Width x Channels to Channels x Height x Width:

img = img.transpose((2, 0, 1))

Merely changing the shape will scramble your image and ruin your data!

To read back the image:

with h5py.File(h5_filename, 'r') as hf:
    images_test = hf.get('images')
    targets_test = hf.get('targets')
    for i, img in enumerate(images_test):
        print(targets_test[i])
        from skimage.viewer import ImageViewer
        viewer = ImageViewer(img.reshape(SIZE, SIZE, 3))
        viewer.show()

Here's a script I wrote which deals with two labels (steer and speed) for a self-driving car task: https://gist.github.com/crizCraig/aa46105d34349543582b177ae79f32f0

crizCraig
  • 8,487
  • 6
  • 54
  • 53
1

Besides @Shai's answer above, I wrote a MultiTaskData layer supporting float typed labels.

Its main idea is to store the labels in float_data field of Datum, and the MultiTaskDataLayer will parse them as labels for any number of tasks according to the value of task_num and label_dimension set in net.prototxt. The related files include: caffe.proto, multitask_data_layer.hpp/cpp, io.hpp/cpp.

You can easily add this layer to your own caffe and use it like this (this is an example for face expression label distribution learning task in which the "exp_label" can be float typed vectors such as [0.1, 0.1, 0.5, 0.2, 0.1] representing face expressions(5 class)'s probability distribution.):

    name: "xxxNet"
    layer {
        name: "xxx"
        type: "MultiTaskData"
        top: "data"
        top: "exp_label"
        data_param { 
            source: "expression_ld_train_leveldb"   
            batch_size: 60 
            task_num: 1
            label_dimension: 8
        }
        transform_param {
            scale: 0.00390625
            crop_size: 60
            mirror: true
        }
        include:{ phase: TRAIN }
    }
    layer { 
        name: "exp_prob" 
        type: "InnerProduct"
        bottom: "data"  
        top: "exp_prob" 
        param {
            lr_mult: 1
            decay_mult: 1
        }
        param {
            lr_mult: 2
            decay_mult: 0
        }
        inner_product_param {
            num_output: 8
            weight_filler {
            type: "xavier"
            }    
        bias_filler {      
            type: "constant"
            }  
        }
    }
    layer {  
        name: "exp_loss"  
        type: "EuclideanLoss"  
        bottom: "exp_prob" 
        bottom: "exp_label"
        top: "exp_loss"
        include:{ phase: TRAIN }
    }
Community
  • 1
  • 1
Dale
  • 1,608
  • 1
  • 9
  • 26