I am trying to reduce the memory-consumption of a python dict, which in my case serves as a word-->document_id
"inverted index". Each word
is hashed as an integer, which takes up 24 bytes.
I was wondering if I can convert each element within dict
's values and each key within dict
to a bitarray instead. I've noticed that the max value of any encountered int
is less than 2^22
, so I can maybe just allocate a bit-array of "size 22".
How can this be done? So far I've seen gmpy2
and bitarray
libraries, as well as std::bitset
in the C++ stdlib, which I can use with Cython. I've read from this post that bitarray
is not as fast as gmpy
. In gmpy
, I am not sure how to set the size. Finally, I wonder if the memory-overhead of gmpy
or bitarray
objects in Python is worth it, when I can just use std::bitset
, which probably uses the least memory of all.