I'm looking for the more compact way to store boolean.
numpy internally need 8bits to store one boolean, but np.packbits
allow to pack
them, that's pretty cool.
The problem is that to pack in a 4e6 bytes array a 32e6 bytes array of boolean we need to first spend 256e6 bytes to convert the boolean array in int array !
In [1]: db_bool = np.array(np.random.randint(2, size=(int(2e6), 16)), dtype=bool)
In [2]: db_int = np.asarray(db_bool, dtype=int)
In [3]: db_packed = np.packbits(db_int, axis=0)
In [4]: db.nbytes, db_int.nbytes, db_packed.nbytes
Out[5]: (32000000, 256000000, 4000000)
There is a one year old issue opened in the numpy tracker about that (Cf. https://github.com/numpy/numpy/issues/5377 )
Has someone a solution/better workaround ?
The traceback when we try to do it the right way:
In [28]: db_pb = np.packbits(db_bool)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-28-3715e167166b> in <module>()
----> 1 db_pb = np.packbits(db_bool)
TypeError: Expected an input array of integer data type
In [29]:
PS: I will give bitarray a try but would have get it in pure numpy.