4

What is the best way to store between a million to 450,000 Boolean values in a dictionary like collection indexed by a long number? I need to use the least amount of memory possible. True and Int both take up more than 22 bytes per entry. Is there a lower memory per Boolean possible?

Martlark
  • 14,208
  • 13
  • 83
  • 99
  • 2
    How will this be "dictionary like"? What will be the keys, what will be the values? – Marcin Jul 12 '11 at 11:30
  • He probably meant "array like" – Aaron Digulla Jul 12 '11 at 11:35
  • 1
    If it really is a dict, just the keys will take a significant amount of memory – John La Rooy Jul 12 '11 at 12:12
  • Could you clarify in your question what this collection looks like? Are there really 200,000,000,000 bools and the same number of ints as keys? That many ints will take up over 740GB on their own even if stored in just 4 bytes each. And that many bools will take up another 23GB at one bit each... – Scott Griffiths Jul 12 '11 at 15:39

3 Answers3

4

Check this question. Bitarray seems to be the preferred choice.

Community
  • 1
  • 1
Senthess
  • 17,020
  • 5
  • 23
  • 28
1

The two main modules for this are bitarray and bitstring (I wrote the latter). Each will do what you need, but some plus and minus points for each:

bitarray

  • Written as a C extension so very quick.
  • Python 2 only.

bitstring

  • Pure Python.
  • Python 2.6+ and Python 3.x
  • Richer array of methods for reading and interpreting data.

So it depends on what you need to do with your data. If it's just storage and retrieval then both will be fine, but for performance critical stuff it's better to use bitarray if you can. Take a look at the docs (bitstring, bitarray) to see which you prefer.

Scott Griffiths
  • 21,438
  • 8
  • 55
  • 85
  • I wrote some quick tests to compare the performance of dictionaries, lists, bitstring and the array module. Bitstring and array, whilst memory efficient, are hundreds to thousands times slower. A task which can take 0.4 seconds with a dictionary can take 500 seconds with array. Altering 100000 items in a dict takes 0.05 seconds, whilst using bitstring takes 4 seconds. So unusable due to performance issues. Sorry. – Martlark Jul 16 '11 at 11:12
0

Have you thought about using a hybrid list/bitstring?

Use your list to store one dimension of your bits. Each list item would hold a bitstring of fixed length. You would use your list to focus your search to the bitstring of interest, then use the bitstring to find/modify your bit of interest.

The list should allow the most efficent recall of the bitstrings, the bitstrings should allow you to pack all your data as efficiently as possible, and the hybrid list/bitstring should allow a compromise between speed (slightly slower accessing the bit string in the list) and storage (bit packed data plus list overhead.)