7

I have a bunch of 2D data matrices in Matlab (no image data, but some single precision data).

Does anyone know how to convert 2D matlab matrices to the leveldb format which is required by caffe to train a custom neural network?

I already did the tutorial on how to train on images (using the imagenet architecture) and on mnist (digit recognition dataset). However in the latter example they didn't show how to create the respective database. In the tutorial the database was already provided.

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
mcExchange
  • 6,154
  • 12
  • 57
  • 103
  • Do you know https://github.com/kyamagu/matlab-leveldb ? – fuesika Jun 05 '15 at 10:35
  • Not yet. Did you try it out yourself? I just tried to load a leveldb database with it. Loading seems to work fine, but the database seems to be empty. (I cannot display any keys and the matlab variable is only 100 bytes big while the real database is 2Gb). My database incorporates the files "data.mdb" and "lock.mdb". Maybe caffe uses some modified version of leveldb? – mcExchange Jun 05 '15 at 13:22
  • 1
    why not using HDF5_DATA layer instead? more flexible... – Shai Jun 06 '15 at 17:49
  • @Shai: Could you explain on that / give an example? – mcExchange Jun 08 '15 at 08:29

1 Answers1

6

I still don't know to create a leveldb database of my 2D data matrices for usage in caffe but I finally solved by problem:
I ended up using Shai's proposal to convert the data to HDF5 format. It is quite easy to read and write HDF5 databases in Matlab. You just have to use the functions hdf5info(),h5read(),h5create() and h5write() which are already implemented in Matlab.

Example:
- Change the data type in your caffe prototxt file to "hdf5layer", like this:

name: "LeNet"
layer {
  name: "mnist"
  type: "HDF5Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  hdf5_data_param {
    source: "/path/to/your/database/myMnist_train.txt"
    batch_size: 64
  }
}

Use Matlab to create HDF5 databases:
- Caffe: Your input training data has to be a 4-D matrix where the last two dimensions are equal to the size of your 2D input data matrix in matlab.
- Example: Take a 2d matrix (image or single precision data) of size 54x24 (#rows x cols)
- -> transpose it, and stack it into a 24x54x1xN matrix, where N is the number of 2d matrices (training samples)
- The labels are in a 1xN row vectors in matlab.
- Now create your hdf5 database:

h5create(['train.h5'],'/data',[24 54 1 length(trainLabels)]);
h5create(['train.h5'],'/label',[1 length(trainLabels)]);
h5write(['train.h5'],'/data',trainData);
h5write(['train.h5'],'/label',trainLabels);
  • As you can see, caffe expects a hdf5 database with the variables "data" and "label"
  • Reading a database:
    Use hdf5info(filename) to get the dataset names inside a hdf5 database. Then use data = h5read(filename,dataset) to read the dataset
Community
  • 1
  • 1
mcExchange
  • 6,154
  • 12
  • 57
  • 103
  • Excellent! BTW, you need `/path/to/your/database/myMnist_train.txt` file to contain the `h5` file names, e.g., to contain line: train.h5. – Shai Jun 16 '15 at 14:43
  • Do you have to subtract the matrices by the mean matrix before you store them? – mad Jun 19 '15 at 07:37
  • It's your data so you should now how you would like to transform it ;). Generally spoken mean subtraction and division by standard deviation should always improve numerical optimization. – mcExchange Jun 19 '15 at 10:53
  • This is a good alternative, but isn't the correct answer because [HDF5 data layers](https://github.com/BVLC/caffe/issues/2225) [are not](https://github.com/BVLC/caffe/pull/569) [fully featured](https://github.com/BVLC/caffe/pull/1070). I'm going to try and figure out the answer myself, but if you have any insights, it would be nice. Otherwise, I would appreciate if the question could be updated to what you actually wanted (any format Caffe takes) so I can ask for the more specific answer you originally proposed. :\ Otherwise, this is very useful. Thank you. – Poik Oct 02 '15 at 05:28