1

I need to create a fast, efficient, low-overhead routine for storing key / value pairs in LMDB for subsequent consumption by Caffe's data layer (i.e., no linking to a bunch of external libraries).

I've reviewed the caffe.proto, caffe.pb.h and caffe.pb.cc files and a handful of others pertaining to Google's protocol buffers to gain an understanding of the Datum class, which is the 'value' in LMDB records.

The best bet for me appears to be an audit of the datum.SerializeToString() method, which takes all the data structures and nested structures comprising Datum and converts them to some sort of string value. However, after plumbing the depths of Google's protobuf, I haven't been able to find where this method is defined.

Can someone point me in the right direction? And obviously if there's a faster / better / cheaper way of understanding how the serialized Datum value should be structured, then I'd definitely be open to it. Thanks.

Shai
  • 111,146
  • 38
  • 238
  • 371
Pete Janda
  • 73
  • 8

1 Answers1

0

I think you are looking for caffe.io.array_to_datum method: this is a python wrapper to caffe protobuf interface converting a numpy array (and an optional integer label) into a Datum object.
There is a more comprehensive example on how to read/write LMDB for caffe using python interface here.
If you are just into converting a list of labeled images into LMDB, you can use convert_imageset tool that ships with caffe.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • I followed Caffe's architecture upstream, where data layers are generated. In ~/caffe/src/caffe/layers you see data_layer.cpp, which includes routines such as DataLayer::DataLayerSetUp that contain "datum.ParseFromString(cursor_->value());" I haven't been able to locate the methods for parsing and serializing strings. Unfortunately convert_imageset is inapplicable to my particular situation. – Pete Janda Oct 17 '18 at 22:42
  • @PeteJanda `datum.ParseFromString` is part of google protobuf automatically generated functions. If `convert_imageset` is not good for you, you can use python interface to write the lmdb like in [this example](http://deepdish.io/2015/04/28/creating-lmdb-in-python/) – Shai Oct 18 '18 at 05:42