17

I've extracted features using caffe, which generates a .mdb file. Then I'm trying to read it using Python and display it as a readable number.

import lmdb

lmdb_env = lmdb.open('caffefeat')
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()

for key, value in lmdb_cursor:
    print str(value)

This prints out a very long line of unreadable, broken characters.

Then I tried printing int(value), which returns the following:

ValueError: invalid literal for int() with base 10: '\x08\x80 \x10\x01\x18\x015\x8d\x80\xad?5'

float(value) gives the following:

ValueError: could not convert string to float:? 5????5

Is this a problem with the lmdb file itself, or does it have to do with conversion of data type?

ytrewq
  • 3,670
  • 9
  • 42
  • 71

2 Answers2

34

Here's the working code I figured out

import caffe
import lmdb

lmdb_env = lmdb.open('directory_containing_mdb')
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
datum = caffe.proto.caffe_pb2.Datum()

for key, value in lmdb_cursor:
    datum.ParseFromString(value)
    label = datum.label
    data = caffe.io.datum_to_array(datum)
    for l, d in zip(label, data):
            print l, d
Ghilas BELHADJ
  • 13,412
  • 10
  • 59
  • 99
ytrewq
  • 3,670
  • 9
  • 42
  • 71
  • 1
    I got error `ValueError: cannot reshape array of size 29367 into shape (0,0,0)`. I am using python2 under anaconda2, and installed caffe using `conda install caffe` – skyuuka Dec 21 '18 at 08:48
  • Can I provide only the path for the mdb file instad of its folder? – alper Jan 01 '21 at 20:13
17

If you have encoded images in lmdb, you'll probably see this error when using @ytrewq's code

ValueError: total size of new array must be unchanged

Use this function instead:

import caffe
import lmdb
import PIL.Image
from io import StringIO
import numpy as np

def read_lmdb(lmdb_file):
    cursor = lmdb.open(lmdb_file, readonly=True).begin().cursor()
    datum = caffe.proto.caffe_pb2.Datum()
    for _, value in cursor:
        datum.ParseFromString(value)
        s = StringIO()
        s.write(datum.data)
        s.seek(0)

        yield np.array(PIL.Image.open(s)), datum.label

Example:

lmdb_dir = '/save/jobs/20160613-125532-958f/train_db/'
for im, label in read_lmdb(lmdb_dir):
    print label, im
alper
  • 2,919
  • 9
  • 53
  • 102
Ghilas BELHADJ
  • 13,412
  • 10
  • 59
  • 99
  • Does this error you are solving here stem from lmdb created with encoded images? – Shai Jun 14 '16 at 09:58
  • 1
    @Shai Yes, see the [discussion here](https://groups.google.com/d/msg/digits-users/CzHG1aHizsw/QYE3qWpxBgAJ) – Ghilas BELHADJ Jun 14 '16 at 10:24
  • Thank you for linking to the relevant thread. adds a proper context here. Can you please edit your answer to reflect it's relevance to encoded `lmdb`s? It is very good to state both the error message as well as the root cause: encoded images in lmdb. Thanks! – Shai Jun 14 '16 at 10:26
  • Done ! Thank you for the advice – Ghilas BELHADJ Jun 14 '16 at 10:33
  • 1
    Tried running and got the error `google.protobuf.message.DecodeError: Unexpected end-group tag.` Any idea how to fix this? – Austin Mar 02 '18 at 20:38
  • This answer saved me, and I got error `ValueError: cannot reshape array of size 29367 into shape (0,0,0)` – skyuuka Dec 21 '18 at 08:52