packing boolean array needs go throught int (numpy 1.8.2)

Question

I'm looking for the more compact way to store boolean. numpy internally need 8bits to store one boolean, but np.packbits allow to pack them, that's pretty cool.

The problem is that to pack in a 4e6 bytes array a 32e6 bytes array of boolean we need to first spend 256e6 bytes to convert the boolean array in int array !

In [1]: db_bool = np.array(np.random.randint(2, size=(int(2e6), 16)), dtype=bool)
In [2]: db_int = np.asarray(db_bool, dtype=int)
In [3]: db_packed = np.packbits(db_int, axis=0)
In [4]: db.nbytes, db_int.nbytes, db_packed.nbytes
Out[5]: (32000000, 256000000, 4000000)

There is a one year old issue opened in the numpy tracker about that (Cf. https://github.com/numpy/numpy/issues/5377 )

Has someone a solution/better workaround ?

The traceback when we try to do it the right way:

In [28]: db_pb = np.packbits(db_bool)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-3715e167166b> in <module>()
----> 1 db_pb = np.packbits(db_bool)
TypeError: Expected an input array of integer data type
In [29]:

PS: I will give bitarray a try but would have get it in pure numpy.

Have you thought about using a sparse matrix? What do you need to do with the result? — Will, Dec 29 '15 at 12:46
my usecase is close of this: http://stackoverflow.com/questions/34496409/use-numpy-frompyfunc-to-add-broadcasting-to-a-python-function-with-argument — user3313834, Dec 29 '15 at 12:56

ali_m · Accepted Answer · 2015-12-29T17:36:04.117

5

There's no need to convert your boolean array to the native int dtype (which will be 64 bit on x86_64). You can avoid copying your boolean array by viewing it as np.uint8, which also uses a single byte per element:

packed = np.packbits(db_bool.view(np.uint8))

unpacked = np.unpackbits(packed)[:db_bool.size].reshape(db_bool.shape).view(np.bool)

print(np.all(db_bool == unpacked))
# True

Also, np.packbits should now work directly on boolean arrays as of this commit from over a year ago (numpy v1.10.0 and newer).

edited Dec 29 '15 at 17:36

answered Dec 29 '15 at 16:54

ali_m

71,714
23
223
298

Thanks for your answer, and yes, my numpy version (from ubuntu 15.10) is too old In [132]: np.__version__, Out[132]: '1.8.2', the open issue on github put me on the wrong path. – user3313834 Dec 29 '15 at 18:23

score 4 · Answer 2 · edited May 23 '17 at 11:52

Just yesterday, I answered a question to a newcomer on how to deal with bits in Python - as compared to C++. After warning there would be no speed gains, I sketched-up a naive "bitarray" using internally Python's bytearray objects.

This is in no way fast - but if you are no longer operating on your array bits, and just want the output, maybe it is good enough - as you have full control in Python code about the conversion. Otherwise, you can try just hinting the static types and run the same code as Cython, and you will probably want to use an np array with dtype=int8 instead of a bytearray:

class BitArray(object):
    def __init__(self, length):
        self.values = bytearray(b"\x00" * (length // 8 + (1 if length % 8  else 0)))
        self.length = length

    def __setitem__(self, index, value):
        value = int(bool(value)) << (7 - index % 8)
        mask = 0xff ^ (7 - index % 8)
        self.values[index // 8] &= mask
        self.values[index // 8] |= value
    def __getitem__(self, index):
        mask = 1 << (7 - index % 8)
        return bool(self.values[index // 8] & mask)

    def __len__(self):
        return self.length

    def __repr__(self):
        return "<{}>".format(", ".join("{:d}".format(value) for value in self))

This code was originally posted here: Is there a builtin bitset in Python that's similar to the std::bitset from C++?

packing boolean array needs go throught int (numpy 1.8.2)

2 Answers2

Linked