1

I have a very big dataset and it is not a good idea to convert it to a single LMDB file for Caffe. Thus, I am trying to split it into small parts and specify a TXT file containing the paths to the corresponding LMDB files. Here's an example of my data layer:

layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "path/to/lmdb.txt"
batch_size: 256
backend: LMDB
  }
}

And this is my lmdb.txt file:

/path/to/train1lmdb
/path/to/train2lmdb
/path/to/train3lmdb

However, I got the following error:

I0828 10:30:40.639502 26950 layer_factory.hpp:77] Creating layer data
F0828 10:30:40.639549 26950 db_lmdb.hpp:15] Check failed: mdb_status == 0 
(20 vs. 0) Not a directory
*** Check failure stack trace: ***
@     0x7f678e4a3daa  (unknown)
@     0x7f678e4a3ce4  (unknown)
@     0x7f678e4a36e6  (unknown)
@     0x7f678e4a6687  (unknown)
@     0x7f678ebee5e1  caffe::db::LMDB::Open()
@     0x7f678eb2b7d4  caffe::DataLayer<>::DataLayer()
@     0x7f678eb2b982  caffe::Creator_DataLayer<>()
@     0x7f678ec1a1a9  caffe::Net<>::Init()
@     0x7f678ec1c382  caffe::Net<>::Net()
@     0x7f678ec2e200  caffe::Solver<>::InitTrainNet()
@     0x7f678ec2f153  caffe::Solver<>::Init()
@     0x7f678ec2f42f  caffe::Solver<>::Solver()
@     0x7f678eabcc71  caffe::Creator_SGDSolver<>()
@           0x40f18e  caffe::SolverRegistry<>::CreateSolver()
@           0x40827d  train()
@           0x405bec  main
@     0x7f678ccfaf45  (unknown)
@           0x4064f3  (unknown)
@              (nil)  (unknown)
Aborted (core dumped)

So, how can I make it work? Is this kind of method feasible? Thanks in advance.

Y.C.Sun
  • 23
  • 4

1 Answers1

2

The problem:
You are confusing "Data" layer and "HDF5Data" layer:
With "Data" layer you can only specify one lmdb/leveldb dataset, and your source: entry should point to the only database you are using.
On the other hand, with "HDF5Data" layer you can have multiple binary hdf5 files, and the source: parameter points to a text file listing all the binary files you are about to use.

Solutions
0. (Following PrzemekD's comment) Add different "Data" layer for each lmdb you have (with smaller batch_size) and then use "Concat" layer to "merge" the different inputs into a single minibatch.
1. As you can already guess, one solution is to convert your data to hdf5 binary format and use "HDF5Data" layer.
2. Alternatively, you can write your own "Python" input layer, this layer should be able to read from all lmdb file (using python lmdb interface) and feed the data, batch by batch to your net.

Shai
  • 111,146
  • 38
  • 238
  • 371