0

I am able to read binary format (cifar10 data_batch1.bin) into a numpy matirx in python, but I am struggling to write it into an lmdb file. Could you please give me a direction?

  • see answers for [this thread](http://stackoverflow.com/q/31649216/1714410) - you'll see how to write LMDB for caffe in python. – Shai Sep 06 '16 at 06:00

1 Answers1

0

I ran into the same problem some months ago. The following resources helped me a lot:

If I remember it right, the following code worked for me (with uint, 8 bit data):

import lmdb
import caffe

# Let images be a N x 3 x H x W matrix, i.e. N samples, 
# 3 color channels (in BGR) height H and width W;
# you will need to get your images into the above 
# blob shape (i.e. samples x channels x height x width).
# Let labels be a N x 1 matrix containing the labels.

env = lmdb.open('lmdb_path', map_size = X.nbytes * 10)

with env.begin(write = True) as txn:
    for i in range(N):
        datum = caffe.proto.caffe_pb2.Datum()
        datum.channels = images.shape[1]
        datum.height = images.shape[2]
        datum.width = images.shape[3]
        datum.data = images[i].tostring()

        label = int(labels[i])
        datum.label = label

        # Alternatively, use:
        # datum = caffe.io.array_to_datum(images[i], label)

        str_id = '{:08}'.format(i)

        # You might need to check whether the encode is necessary in Python 2.7, I used Python 3:
        txn.put(str_id.encode('ascii'), datum.SerializeToString())

Make sure that you use BGR color space for your images: https://github.com/BVLC/caffe/wiki/Image-Format:-BGR-not-RGB.

David Stutz
  • 2,578
  • 1
  • 17
  • 25