14
import lmdb
env = lmdb.open(path_to_lmdb)

Now I seem to need to create a transaction and a cursor, but how do I get a list of keys that I can iterate over?

Doug
  • 1,446
  • 4
  • 14
  • 26

3 Answers3

15

A way to get the total number of keys without enumerating them individually, counting also all sub databases:

with env.begin() as txn:
    length = txn.stat()['entries']

Test result with a hand-made database of size 1000000 on my laptop:

  • the method above is instantaneous (0.0 s)
  • the iteration method takes about 1 second.
sytrus
  • 757
  • 8
  • 11
8

Are you looking for something like this:

with env.begin() as txn:
    with txn.cursor() as curs:
        # do stuff
        print 'key is:', curs.get('key')

Update:

This may not be the fastest:

with env.begin() as txn:
   myList = [ key for key, _ in txn.cursor() ]
   print(myList)

Disclaimer: I don't know anything about the library, just searched its docs and searched for key in the docs.

Achal Dave
  • 4,079
  • 3
  • 26
  • 32
Sait
  • 19,045
  • 18
  • 72
  • 99
  • No. I'm aware of the documentation page. I want to know how to get the total number of keys without enumerating them individually. I would also like to know the best (fastest) way to enumerate all the key value pairs. The method you mentioned seems to take quite a while for me, but it could have something to do with the size of my db (about 1m entries). – Doug Sep 09 '15 at 22:16
  • @Doug I updated my answer to get the list of keys, by iterating the cursor. There might be a faster way though. – Sait Sep 09 '15 at 22:30
  • Apart from the fact that it would take a long time to iterate through the keys, are there any other disadvantages to reading a list of keys? – Rakshit Kothari Sep 12 '20 at 16:02
4

As Sait pointed out, you can iterate over a cursor to collect all keys. However, this may be a bit inefficient, as it would also load the values. This can be avoided, by using on the cursor.iternext() function with values=False.

with env.begin() as txn:
  keys = list(txn.cursor().iternext(values=False))

I did a short benchmark between both methods for a DB with 2^20 entries, each with a 16 B key and 1024 B value.

Retrieving keys by iterating over the cursor (including values) took 874 ms in average for 7 runs, while the second method, where only the keys are returned took 517 ms. These results may differ depending on the size of keys and values.

randhash
  • 463
  • 4
  • 15