How to extract the bits of larger numeric Numpy data types

Question

Numpy has a library function, np.unpackbits, which will unpack a uint8 into a bit vector of length 8. Is there a correspondingly fast way to unpack larger numeric types? E.g. uint16 or uint32. I am working on a question that involves frequent translation between numbers, for array indexing, and their bit vector representations, and the bottleneck is our pack and unpack functions.

You can create a new `ndarray` with the old one as a buffer and a dtype of `uint8`. I'm not sure what the best way to handle byte order is, though. http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html and http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html might help. — user2357112, Aug 18 '13 at 05:50

Phillip Cloud · Accepted Answer · 2013-08-18T06:29:04.233

23

You can do this with view and unpackbits

Input:

unpackbits(arange(2, dtype=uint16).view(uint8))

Output:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]

For a = arange(int(1e6), dtype=uint16) this is pretty fast at around 7 ms on my machine

%%timeit
unpackbits(a.view(uint8))

100 loops, best of 3: 7.03 ms per loop

As for endianness, you'll have to look at http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html and apply the suggestions there depending on your needs.

edited Aug 18 '13 at 06:29

answered Aug 18 '13 at 06:16

Phillip Cloud

24,919
11
68
88

1

This only works for integers in the range of ``uint8``. E.g. ``np.unpackbits(np.array([2**17], dtype="uint16").view("uint8"))`` will return an array of zeros. – Pietro Battiston Jan 03 '18 at 00:02
2

@PietroBattiston it works just fine. What did you expect from stuffing 2**17 into uint16? – DeeY Apr 03 '18 at 22:09
1

Wait, `np.unpackbits(np.array([2**8], dtype="uint16").view("uint8"))` returns `[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]` and this is wrong. What said before is true, only works for integers in the range of `uint8` – Marco Ancona Jul 31 '18 at 10:26
1

@MarcoAncona, I think this is what Philip meant in his Aug 18 edit: `np.unpackbits(np.array([2**8], dtype=">i2").view(np.uint8))` works just fine. – Zeke Arneodo Jun 02 '19 at 12:23

Ross · Answer 2 · 2020-12-01T18:35:44.720

20

This is the solution I use:

def unpackbits(x, num_bits):
    if np.issubdtype(x.dtype, np.floating):
        raise ValueError("numpy data type needs to be int-like")
    xshape = list(x.shape)
    x = x.reshape([-1, 1])
    mask = 2**np.arange(num_bits, dtype=x.dtype).reshape([1, num_bits])
    return (x & mask).astype(bool).astype(int).reshape(xshape + [num_bits])

This is a completely vectorized solution that works with any dimension ndarray and can unpack however many bits you want.

edited Dec 01 '20 at 18:35

answered Jul 25 '18 at 00:31

Ross

567
1
4
8

1

You are a true hero @Ross that is some slick numpy magic right there. Much faster than other methods we tried out. – dwagnerkc Feb 11 '20 at 23:54
This doesn't work if the num_bits is very large: ```a = unpackbits(np.array([0x999999999999999999999999999999]), num_bits=256) b = unpackbits(np.array([0x9999999999999999]), num_bits=256) print(np.all(a==b))``` – kory Nov 29 '20 at 16:37
You can see that it fails on the mask for large num_bits: ```num_bits=256 mask = 2**np.arange(num_bits).reshape([1, num_bits]) print(mask)``` – kory Nov 29 '20 at 16:43
Thanks for finding that @kory, I updated the answer to account for other numpy datatypes like in your example. – Ross Dec 01 '20 at 18:30
@Ross the issue is that there are too many bits, your example only works for 64 or fewer bits. My values I gave you are integers. – kory Dec 02 '20 at 21:46
@kory because your integer does not fit into a numpy datatype, it uses a generic python object type as the numpy dtype. See https://stackoverflow.com/questions/37271654/stocking-large-numbers-into-numpy-array. If you try out the code for your example, you will see that it is correct. – Ross Dec 26 '20 at 14:18

score 3 · Answer 3 · answered Aug 18 '13 at 06:13

I have not found any function for this too, but maybe using Python's builtin struct.unpack can help make the custom function faster than shifting and anding longer uint (note that I am using uint64).

>>> import struct
>>> N = np.uint64(2 + 2**10 + 2**18 + 2**26)
>>> struct.unpack('>BBBBBBBB', N)
(2, 4, 4, 4, 0, 0, 0, 0)

The idea is to convert those to uint8, use unpackbits, concatenate the result. Or, depending on your application, it may be more convenient to use structured arrays.

There is also built-in bin() function, which produces string of 0s and 1s, but I am not sure how fast it is and it requires postprocessing too.

score 0 · Answer 4 · edited Mar 02 '18 at 16:46

This works for arbitrary arrays of arbitrary uint (i.e. also for multidimensional arrays and also for numbers larger than the uint8 max value).

It cycles over the number of bits, rather than over the number of array elements, so it is reasonably fast.

def my_ManyParallel_uint2bits(in_intAr,Nbits):
    ''' convert (numpyarray of uint => array of Nbits bits) for many bits in parallel'''
    inSize_T= in_intAr.shape
    in_intAr_flat=in_intAr.flatten()
    out_NbitAr= numpy.zeros((len(in_intAr_flat),Nbits))
    for iBits in xrange(Nbits):
        out_NbitAr[:,iBits]= (in_intAr_flat>>iBits)&1
    out_NbitAr= out_NbitAr.reshape(inSize_T+(Nbits,))
    return out_NbitAr  

A=numpy.arange(256,261).astype('uint16')
# array([256, 257, 258, 259, 260], dtype=uint16)
B=my_ManyParallel_uint2bits(A,16).astype('uint16')
# array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint16)

How to extract the bits of larger numeric Numpy data types

4 Answers4

Linked