25

Numpy has a library function, np.unpackbits, which will unpack a uint8 into a bit vector of length 8. Is there a correspondingly fast way to unpack larger numeric types? E.g. uint16 or uint32. I am working on a question that involves frequent translation between numbers, for array indexing, and their bit vector representations, and the bottleneck is our pack and unpack functions.

Cardano
  • 931
  • 1
  • 8
  • 14
  • You can create a new `ndarray` with the old one as a buffer and a dtype of `uint8`. I'm not sure what the best way to handle byte order is, though. http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html and http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html might help. – user2357112 Aug 18 '13 at 05:50

4 Answers4

23

You can do this with view and unpackbits

Input:

unpackbits(arange(2, dtype=uint16).view(uint8))

Output:

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]

For a = arange(int(1e6), dtype=uint16) this is pretty fast at around 7 ms on my machine

%%timeit
unpackbits(a.view(uint8))

100 loops, best of 3: 7.03 ms per loop

As for endianness, you'll have to look at http://docs.scipy.org/doc/numpy/user/basics.byteswapping.html and apply the suggestions there depending on your needs.

Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
  • 1
    This only works for integers in the range of ``uint8``. E.g. ``np.unpackbits(np.array([2**17], dtype="uint16").view("uint8"))`` will return an array of zeros. – Pietro Battiston Jan 03 '18 at 00:02
  • 2
    @PietroBattiston it works just fine. What did you expect from stuffing 2**17 into uint16? – DeeY Apr 03 '18 at 22:09
  • 1
    Wait, `np.unpackbits(np.array([2**8], dtype="uint16").view("uint8"))` returns `[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]` and this is wrong. What said before is true, only works for integers in the range of `uint8` – Marco Ancona Jul 31 '18 at 10:26
  • 1
    @MarcoAncona, I think this is what Philip meant in his Aug 18 edit: `np.unpackbits(np.array([2**8], dtype=">i2").view(np.uint8))` works just fine. – Zeke Arneodo Jun 02 '19 at 12:23
20

This is the solution I use:

def unpackbits(x, num_bits):
    if np.issubdtype(x.dtype, np.floating):
        raise ValueError("numpy data type needs to be int-like")
    xshape = list(x.shape)
    x = x.reshape([-1, 1])
    mask = 2**np.arange(num_bits, dtype=x.dtype).reshape([1, num_bits])
    return (x & mask).astype(bool).astype(int).reshape(xshape + [num_bits])

This is a completely vectorized solution that works with any dimension ndarray and can unpack however many bits you want.

Ross
  • 567
  • 1
  • 4
  • 8
  • 1
    You are a true hero @Ross that is some slick numpy magic right there. Much faster than other methods we tried out. – dwagnerkc Feb 11 '20 at 23:54
  • This doesn't work if the num_bits is very large: ```a = unpackbits(np.array([0x999999999999999999999999999999]), num_bits=256) b = unpackbits(np.array([0x9999999999999999]), num_bits=256) print(np.all(a==b))``` – kory Nov 29 '20 at 16:37
  • You can see that it fails on the mask for large num_bits: ```num_bits=256 mask = 2**np.arange(num_bits).reshape([1, num_bits]) print(mask)``` – kory Nov 29 '20 at 16:43
  • Thanks for finding that @kory, I updated the answer to account for other numpy datatypes like in your example. – Ross Dec 01 '20 at 18:30
  • @Ross the issue is that there are too many bits, your example only works for 64 or fewer bits. My values I gave you are integers. – kory Dec 02 '20 at 21:46
  • @kory because your integer does not fit into a numpy datatype, it uses a generic python object type as the numpy dtype. See https://stackoverflow.com/questions/37271654/stocking-large-numbers-into-numpy-array. If you try out the code for your example, you will see that it is correct. – Ross Dec 26 '20 at 14:18
3

I have not found any function for this too, but maybe using Python's builtin struct.unpack can help make the custom function faster than shifting and anding longer uint (note that I am using uint64).

>>> import struct
>>> N = np.uint64(2 + 2**10 + 2**18 + 2**26)
>>> struct.unpack('>BBBBBBBB', N)
(2, 4, 4, 4, 0, 0, 0, 0)

The idea is to convert those to uint8, use unpackbits, concatenate the result. Or, depending on your application, it may be more convenient to use structured arrays.

There is also built-in bin() function, which produces string of 0s and 1s, but I am not sure how fast it is and it requires postprocessing too.

Roman Susi
  • 4,135
  • 2
  • 32
  • 47
0

This works for arbitrary arrays of arbitrary uint (i.e. also for multidimensional arrays and also for numbers larger than the uint8 max value).

It cycles over the number of bits, rather than over the number of array elements, so it is reasonably fast.

def my_ManyParallel_uint2bits(in_intAr,Nbits):
    ''' convert (numpyarray of uint => array of Nbits bits) for many bits in parallel'''
    inSize_T= in_intAr.shape
    in_intAr_flat=in_intAr.flatten()
    out_NbitAr= numpy.zeros((len(in_intAr_flat),Nbits))
    for iBits in xrange(Nbits):
        out_NbitAr[:,iBits]= (in_intAr_flat>>iBits)&1
    out_NbitAr= out_NbitAr.reshape(inSize_T+(Nbits,))
    return out_NbitAr  

A=numpy.arange(256,261).astype('uint16')
# array([256, 257, 258, 259, 260], dtype=uint16)
B=my_ManyParallel_uint2bits(A,16).astype('uint16')
# array([[0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0],
#       [0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0]], dtype=uint16)
JonathanDavidArndt
  • 2,518
  • 13
  • 37
  • 49