Python/Numpy: Convert list of bools to unsigned int

Question

What is the fastest (or most "Pythonic") way to convert
```
x = [False, False, True, True]
```
into 12? (If there is such a way.)
What if x were instead a numpy.array of bools? Is there a special command for that?

I have a large m-by-n array of booleans, where each n-element row represents a single low-dimensional hash of a high-dimensional feature vector. (In the example above, n = 4.) I would like to know the answer in order to compress my data as much as possible. Thank you.

Edit: Thank you for the responses! Using the following test code,

t = 0
for iter in range(500):
    B = scipy.signbit(scipy.randn(1000,20))
    for b in B:
        t0 = time.clock()
        # test code here
        t1 = time.clock()
        t += (t1-t0)
print t

...here were the runtimes on my Thinkpad laptop:

My answer: 4.26 sec
Sven Marnach 1: 7.88
Emil H: 8.51
Sven Marnach 2: 8.72
delnan: 10.14
liori: 53.49

Of course, I welcome any independent tests that may confirm or refute my data!

Edit: In my answer below, changing int(j) to simply j still works, but runs six times as slow! Then perhaps the other answers would become faster if the bool was casted using int. But I'm too lazy to test everything again.

Edit: liori posted results of independent tests here.

What's the rule to convert the [False, False, True, True] into 12? — , Oct 31 '10 at 23:34
Please use `timeit` for testing, it is much less prone to errors. My times: http://pastebin.com/x1FEP9gY — liori, Nov 01 '10 at 02:16
Thanks for the tests! I don't doubt them at all. I have added them to the post. — Steve Tjoa, Nov 01 '10 at 02:31
Just something to note - in liori's test, sven2() fails miserably because we are using 1000-bit numbers. Check the results (as in the numbers returned by each function) and you'll see that its result is wrong for that large of a number. — Justin Peel, Nov 01 '10 at 05:14
@Justing Peel - While you're quite correct in general, unless I'm missing something, the example is using 20-bit numbers, _not_ 1000-bit! There shouldn't be any problem with 20 columns on the numpy-based methods.... — Joe Kington, Nov 01 '10 at 14:39
I think he means in the tests that liori posted. The vector length is 1000 and is tested over 1000 trials. — Steve Tjoa, Nov 01 '10 at 14:45
@Steve - Ah, that makes more sense! Thanks, and sorry for the noise! — Joe Kington, Nov 01 '10 at 17:01

score 11 · Accepted Answer · answered Nov 01 '10 at 05:25

11

Taking various ideas from various other answers, here's another way to do it:

sum(1<<i for i, b in enumerate(x) if b)

It is quite fast in my tests - right up with the numpy method for large number of bits even though it overflows like crazy. I used liori's testing module for testing. Steve's method, with the change I suggested, is just barely faster. However, if a lot of these sorts of conversions need to be done at a time (and with not too many bits), I'm betting that numpy will be faster.

answered Nov 01 '10 at 05:25

Justin Peel

46,722
6
58
80

1

`sum(b< – kennytm Nov 01 '10 at 06:52
@KennyTM. Clever but I profiled it the original is about 20% faster. It's the fastest by far. – aaronasterling Nov 01 '10 at 07:08

Sven Marnach · Answer 2 · 2011-07-29T10:19:47.850

6

Most Pythonic might be this:

sum(2**i*b for i, b in enumerate(x))

It's hard to tell if it is also the fastest.

In numpy I would use

numpy.sum(2**numpy.arange(len(x))*x)

but this won't be faster for small arrays x, and it won't work for big arrays x since machine size integers are used instead of Pythons arbitrary precision ints.

edited Jul 29 '11 at 10:19

answered Oct 31 '10 at 23:38

Sven Marnach

574,206
118
941
841

Thanks! For some array sizes, the second solution worked quite well, but for others it didn't. – Steve Tjoa Nov 01 '10 at 00:38
@Steve - The other advantage of the numpy solution is that you can avoid iterating through each row. Using the "`B`" array from your test code above: `numpy.sum(2**numpy.arange(B.shape[1])*B, axis=1)`. This should give a large speedup compared with iterating over each row in the array... The full 500x loop executes in less than a second on my machine... – Joe Kington Nov 01 '10 at 02:19
1

Since numpy doesn't handle large integers the same as Python, you have to be careful with really large numbers. If there will larger numbers, you can get a little more out of this method by doing `dtype=numpy.longlong` in arange(). Also, there is a very, very small speed-up by using the sum method of the resultant numpy array rather than using numpy.sum. – Justin Peel Nov 01 '10 at 05:17

liori · Answer 3 · 2010-10-31T23:55:30.543

3

reduce(lambda a,b:2*a+b, reversed(x))

You could get rid of reversed() if you had least significant bit at the end of array. This works with numpy.array too, and doesn't need enumerate(). From my tests seem to be faster too: no need to use exponentiation.

edited Oct 31 '10 at 23:55

answered Oct 31 '10 at 23:45

liori

40,917
13
78
105

Thank you for the elegant solution! I was blown away when I first saw it. Unfortunately, it seems to run the slowest, with or without `reversed`. Might anyone know why? – Steve Tjoa Nov 01 '10 at 00:33
@Steve: on my computer it is faster than sum+exponentiation. Funny thing... how long vectors do you use? Do you test performance using `timeit`? – liori Nov 01 '10 at 01:27

score 2 · Answer 4 · 2010-11-01T16:22:48.947

2

An elegant, pythonic, always-working way is this:

def powers(x):
    """yield powers of x, starting from x**0 forever"""
    power = 1
    while True:
        yield power
        power *= x

def bools_to_int(bools):
    # in Python 2, use itertools.izip!
    return sum(int(place) * place_weight for place_weight, place in 
               zip(powers(2), bools))

Note that you can get rid of powers (by enumerate and squaring in the comprehension, as other answers do) - but maybe it's clearer this way.

edited Nov 01 '10 at 16:22

answered Oct 31 '10 at 23:38

Your answer doesn't give the same answer as the others. Substituting `bools` for `reversed(bools)` fixes it. – Justin Peel Nov 01 '10 at 05:07
@Justin Peel: Come again? I already noticed that shortly after answering and added `reversed`... – Nov 01 '10 at 12:53
try the code that you have here with the example given by the OP. I get 3 as the answer when it should be 12. You didn't need to put the `reversed` in. – Justin Peel Nov 01 '10 at 15:55
*bashes head against wall* @Justin: Yes, you're right and now I realize why. – Nov 01 '10 at 16:23

score 2 · Answer 5 · answered Nov 01 '10 at 00:16

2

My initial attempt, just for reference:

def bool2int(x):
    y = 0
    for i,j in enumerate(x):
        if j: y += int(j)<<i
    return y

answered Nov 01 '10 at 00:16

Steve Tjoa

59,122
18
90
101

Wait, this is interesting: changing `int(j)` to simply `j` still works, but runs six times as slow! – Steve Tjoa Nov 01 '10 at 00:46
3

If you just change `int(j)` to 1, yours is the fastest. – Justin Peel Nov 01 '10 at 05:07

score 1 · Answer 6 · edited Aug 23 '13 at 21:32

1

I was trying ipython %timeit and it seems that doing the following is faster:

y = 0
for i,j in enumerate(x):
    if j: y += 1<<i

In addition, if your boolean vector is a numpy.ndarray, converting it to python array x.tolist() and running the same seems to work faster in this case. It's all marginal, but consistent as well as, at these speeds, marginals add up well.

edited Aug 23 '13 at 21:32

Broxzier

2,909
17
36

answered Aug 23 '13 at 21:00

Atreya

11
1

score 1 · Answer 7 · answered Jul 19 '14 at 00:35

numpy has the packbits function for this. It also supports operations along axes:

In [3]: B = scipy.signbit(scipy.randn(1000,8)).astype("i1")

In [3]: B[0]
Out[3]: array([0, 1, 0, 0, 0, 1, 0, 0], dtype=int8)

In [4]: np.packbits(B[0])
Out[4]: array([68], dtype=uint8)

In [5]: %timeit np.packbits(B, axis=1)
10000 loops, best of 3: 37 µs per loop

it works for int8 sizes for larger sizes you have to shift and or

In [8]: x # multiple of 8
Out[8]: array([1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1], dtype=int8)

In [9]: r = np.packbits(x).astype(np.int32); r
Out[9]: array([171, 129], dtype=uint8)

In [10]: r[0] << 8 | r[1] 
Out[10]: 33237

In [11]: sum(1<<i for i, b in enumerate(x[::-1]) if b)
Out[11]: 33237

if x is no multiple of 8 you have to pad in zeros

score 1 · Answer 8 · answered Oct 31 '10 at 23:37

1

Something like this?

>>> x = [False, False, True, True]
>>> sum([int(y[1])*2**y[0] for y in enumerate(x)])
12

You can convert a numpy array to a regular list using a list() cast.

>>> a = numpy.array([1,2,3,4])
>>> a
array([1, 2, 3, 4])
>>> list(a)
[1, 2, 3, 4]

answered Oct 31 '10 at 23:37

Emil H

39,840
10
78
97

1

`0**0` is 1, so you get an off-by-one error if the first element is False. – liori Oct 31 '10 at 23:56
@liori, I don't believe that applies to my code, since I don't actually do that anywhere? Still interesting, though. Didn't know that. – Emil H Nov 01 '10 at 01:22
`int(False)*2==0`. First index given by `enumerate` is `0`. – liori Nov 01 '10 at 01:24
1

@liori, Yes, but I don't raise the value of that to any power. My code does i * 2^j. For the first bit that is i * 2^0 = i*1 = i – Emil H Nov 01 '10 at 01:26
Ok, shame on me. I messed up precedence rules :-). – liori Nov 01 '10 at 01:57
@liori, No problem. In the end, your solution was more elegant anyway. :) – Emil H Nov 01 '10 at 01:59

luispedro · Answer 9 · 2010-11-11T20:44:34.097

1

If you have a matrix, you probably want to do it like this:

#precompute powers of two
vals = 2.**np.arange(20)

B = ....
compressed = np.dot(B, vals) # matrix multiplication.

np.dot should be faster than any loop in Python. Much faster.

edited Nov 11 '10 at 20:44

answered Nov 10 '10 at 18:06

luispedro

6,934
4
35
45

score 0 · Answer 10 · answered Nov 01 '10 at 06:19

If you're willing to add another extension to the mix, I added pack() and unpack() to the development branch of gmpy. My tests show it may be 2x or 3x faster.

>>> import gmpy2
>>> gmpy2.pack([0,0,1,1],1)
mpz(12)
>>> gmpy2.unpack(12,1)
[mpz(0), mpz(0), mpz(1), mpz(1)]

Disclaimer: The development version is called gmpy2 and can co-exist with the stable version. It is still in alpha phase but will hopefully become beta in a few weeks. You need to have both GMP and MPFR libraries installed. The source is available at http://code.google.com/p/gmpy/source/checkout

Python/Numpy: Convert list of bools to unsigned int

10 Answers10

Linked