Creating multilabel HDF5 file for caffe

Question

I created HDF5 data using the following python script and placed HDF5 data layer. However, when I tried to train caffe using this data it keeps complaining

Check failed: num_spatial_axes_ == 2 (0 vs. 2) kernel_h & kernel_w can only be used for 2D convolution

Here is how my data looks like:

Data(1x3253), label(1x128) binary. I sliced the 128 into 16 bytes and translated that to dec to use it as a mulitlabel. So a typical key would look like, (20, 38, 123, 345,...) 1x16. and I have 1,000,000 of data like (1). For now I am just using the first byte, so I will have one integer as a label.

    DIR ="/x/"
    h5_fn= os.path.join('/x/h5Data_train.h5')
    from numpy import genfromtxt  

    dim=64000 
    InputData=np.arange(3253)
    data=np.arange(dim*3253)
    data.shape=(dim,3253)

    fileList=[os.path.join(i) for folder, subdir,files in os.walk(DIR) for i in files]
    for i in range(0,len(fileList)):
         InputData=np.genfromtxt(DIR+fileList[i], delimiter=',',skip_header=24)
         data[i]=InputData

    label=np.arange(dim)
    labelData=np.genfromtxt(DIR+'label_file',comments='\t',dtype=None)

    for i in range(0,dim):
        label[i]=int(labelData[i][0:2],16)

    print "Creating HDF5..."

    with h5py.File(h5_fn,'w') as f:
       f['InputData']=data
       f['label']=label

    text_fn=os.path.join('/x/hdf5.txt')
    with open(text_fn,'w') as f:
       f.write('h5_fn')

This script creates the HDF5, but I am suspecting that the error from caffe is related to how I created my HDF5 file. Can someone tell me if there is anything wrong on how I created the HDF5. Also, is there anyway one can check if the HDF5 file created is as you want? Thanks!

Your indentation is so wrong it is difficult to guess what is nested where. Please fix that. Indentation really matters with Python. — zvone, Oct 06 '16 at 19:02
BTW, you can read the contents of your HDF with [HDFView](https://support.hdfgroup.org/products/java/release/download.html) — zvone, Oct 06 '16 at 19:12
@zvone now should be better. My actual code is correctly indented, here the code block is not that flexible. — user2413711, Oct 06 '16 at 22:47

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

The problem:

Caffe, by default, expects its data to be 4D: batch_size -by- channel -by- height -by- width.
In your model you assume each sample is of shape 1-by-1-by-3253, that is: your data is 1D with only non-singleton width dimension. This is an important detail since you apply convolution along the width dimension.
On the other hand, your HDF5 data is only 2D, and caffe interpret it as dim examples with 3253 channels of width and height 1.
Now you can understand the error message you get: you have a convolution layer with kernel_width and kernel_height params, but the data (as far as caffe understands it) has width and height of 1.

A solution:

You simply need to reshape your data:

data.shape=(dim,1,1,3253)

Now data has 1 channel and height 1 for each sample and width of 3253.

PS,
You are writting to '/x/hdf5.txt' the actual string 'h5_fn' instead of the string stroed in the variable h5_fn...

Creating multilabel HDF5 file for caffe

1 Answers1

The problem:

A solution: