How to feed several LMDB files to the data layer in Caffe

Question

I have a very big dataset and it is not a good idea to convert it to a single LMDB file for Caffe. Thus, I am trying to split it into small parts and specify a TXT file containing the paths to the corresponding LMDB files. Here's an example of my data layer:

layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "path/to/lmdb.txt"
batch_size: 256
backend: LMDB
  }
}

And this is my lmdb.txt file:

/path/to/train1lmdb
/path/to/train2lmdb
/path/to/train3lmdb

However, I got the following error:

I0828 10:30:40.639502 26950 layer_factory.hpp:77] Creating layer data
F0828 10:30:40.639549 26950 db_lmdb.hpp:15] Check failed: mdb_status == 0 
(20 vs. 0) Not a directory
*** Check failure stack trace: ***
@     0x7f678e4a3daa  (unknown)
@     0x7f678e4a3ce4  (unknown)
@     0x7f678e4a36e6  (unknown)
@     0x7f678e4a6687  (unknown)
@     0x7f678ebee5e1  caffe::db::LMDB::Open()
@     0x7f678eb2b7d4  caffe::DataLayer<>::DataLayer()
@     0x7f678eb2b982  caffe::Creator_DataLayer<>()
@     0x7f678ec1a1a9  caffe::Net<>::Init()
@     0x7f678ec1c382  caffe::Net<>::Net()
@     0x7f678ec2e200  caffe::Solver<>::InitTrainNet()
@     0x7f678ec2f153  caffe::Solver<>::Init()
@     0x7f678ec2f42f  caffe::Solver<>::Solver()
@     0x7f678eabcc71  caffe::Creator_SGDSolver<>()
@           0x40f18e  caffe::SolverRegistry<>::CreateSolver()
@           0x40827d  train()
@           0x405bec  main
@     0x7f678ccfaf45  (unknown)
@           0x4064f3  (unknown)
@              (nil)  (unknown)
Aborted (core dumped)

So, how can I make it work? Is this kind of method feasible? Thanks in advance.

Shai · Accepted Answer · 2017-08-28T08:05:14.930

2

The problem:
You are confusing "Data" layer and "HDF5Data" layer:
With "Data" layer you can only specify one lmdb/leveldb dataset, and your source: entry should point to the only database you are using.
On the other hand, with "HDF5Data" layer you can have multiple binary hdf5 files, and the source: parameter points to a text file listing all the binary files you are about to use.

Solutions
0. (Following PrzemekD's comment) Add different "Data" layer for each lmdb you have (with smaller batch_size) and then use "Concat" layer to "merge" the different inputs into a single minibatch.
1. As you can already guess, one solution is to convert your data to hdf5 binary format and use "HDF5Data" layer.
2. Alternatively, you can write your own "Python" input layer, this layer should be able to read from all lmdb file (using python lmdb interface) and feed the data, batch by batch to your net.

edited Aug 28 '17 at 08:05

answered Aug 28 '17 at 05:47

Shai

111,146
38
238
371

1

How about solution 3: have multiple data layers, each loading from a single LMDB, and then concat? – Przemek D Aug 28 '17 at 08:00
@PrzemekD `"Concat"` in the batch-size dimension. That's a nice idea! – Shai Aug 28 '17 at 08:02
1

Will it lead to the out-of-memory problem if I load several LMDBs at a time since my dataset is too big? @PrzemekD – Y.C.Sun Aug 28 '17 at 12:57
@Y.C.Sun caffe does not load all the lmdb into memory. It only reads batch by batch. – Shai Aug 28 '17 at 12:59
1

Thx for your help!@Shai – Y.C.Sun Sep 06 '17 at 07:19

How to feed several LMDB files to the data layer in Caffe

1 Answers1