12

I am working on a Python library that performs a lot of bitwise operations on long bit strings, and I want to find a bit string type that will maximize its speed. I have tried the built-in Python int type, numpy, bitstring, and bitarray, and suprisingly, the Python ints seem to win hands down when it comes to bitwise operations. Everything I have googled says numpy should be much faster for vectorized operations like this. Am I using numpy wrong somehow? Is there another Python library I can use that actually improves on Python's built-in int type?

from timeit import timeit
import random


size = 10000


def int_to_bits(i):
    result = []
    for _ in range(size):
        result.append(i % 2)
        i >>= 1
    return result



x = random.randrange(2**size)
y = random.randrange(2**size)

print(x.bit_length(), y.bit_length())

x_bits = int_to_bits(x)
y_bits = int_to_bits(y)

t = timeit(
    stmt='a & b',
    setup='a = %d; b = %d' % (x, y)
)
print("raw ints:", t)

t = timeit(
    stmt='a & b',
    setup=('import numpy;'
           'a = numpy.array(%r, dtype=int);'
           'b = numpy.array(%r, dtype=int)') % (x_bits, y_bits)
)
print('numpy int array:', t)

t = timeit(
    stmt='a & b',
    setup=('import numpy;'
           'a = numpy.array(%r, dtype=bool);'
           'b = numpy.array(%r, dtype=bool)') % (x_bits, y_bits)
)
print('numpy bool array:', t)

t = timeit(
    stmt='a & b',
    setup=('import numpy;'
           'a = numpy.packbits(%r);'
           'b = numpy.packbits(%r)') % (x_bits, y_bits)
)
print('numpy packed bits:', t)

t = timeit(
    stmt='a & b',
    setup=('import bitstring;'
           'a = bitstring.BitString(%r);'
           'b = bitstring.BitString(%r)') % (x_bits, y_bits)
)
print('bitstring:', t)

t = timeit(
    stmt='a & b',
    setup=('import bitarray;'
           'a = bitarray.bitarray(%r);'
           'b = bitarray.bitarray(%r)') % (x_bits, y_bits)
)
print('bitarray:', t)

Results:

10000 10000
raw ints: 0.29606562735373115
numpy int array: 7.400762747057885
numpy bool array: 1.1108355715984288
numpy packed bits: 1.3064737574273284
bitstring: 380.9796937642803
bitarray: 1.4451143449501842

EDIT:

There seems to be a lot of confusion about how single operations on Python ints/longs are comparable to vector operations on entire numpy bit arrays. A 10,000-bit Python int/long value, when treated as a bit mask (using the & operator just like we can do with ints or longs in C/C++) is directly comparable to a numpy bool array of length 10,000, because they both contain the same number of bits, albeit represented in 2 different ways. The same is true for the other ways of representing 10,000 bits that I tried, including using numpy packed bit arrays, numpy int arrays, and bit array/string types from other libraries. They are all comparable because they are all computing the same function on the same sequences of bits. All that matters here is that I can represent all 10,000 bits and that I can perform bitwise operations on them. If anyone can suggest a more efficient way to represent long, fixed-length sequences of bits that allows bitwise operators (&, |, and ~) to be used, that is what I am looking for.

If you're still confused as to how a Python int/long value can store the same information as a numpy bool array or a numpy binary-valued int array, please refer to the int_to_bits function in the code above; it demonstrates how to extract the bits out of a Python int/long, which shows that performing the & operation on two 10,000-bit ints is fundamentally the same as performing it element-by-element on a list or array of 10,000 Boolean values.

hosford42
  • 376
  • 2
  • 16
  • While it's no surprise Python ints do this fast, some of your timings don't seem right. For example, the bool array definitely shouldn't be faster than the packed array. – user2357112 Jun 06 '15 at 05:56
  • Indeed - these are not 'vector' comparison - these are just comparisons of single integers of a very high `bit_length()`. – gabhijit Jun 06 '15 at 06:05
  • oh and one more thing (2 ** 10000) is not going to fit in uint64 !!! – gabhijit Jun 06 '15 at 06:15
  • @user2357112 When the array length is in the 100,000 ballpark, the packed array starts to be slightly faster than the bool array. I'm not sure why it's slower for 10,000; it seems strange to me too, but the numbers don't lie. – hosford42 Jun 06 '15 at 06:50
  • @gabhijit Think of a Python int/long i as a vector of bits of length i.bit_length(), minus an easy way to access the bit at a particular index. I'm not going to cram 2**10000 into a uint64; I'm going to cram it into a vector of Booleans of length 10000, where each Boolean value corresponds to a single digit in the binary representation of the integer value. – hosford42 Jun 06 '15 at 06:54
  • 1
    @hosford42: When I test it, the bool array is substantially slower. – user2357112 Jun 06 '15 at 07:14
  • I ended up going with Python ints, as they are universally supported and consistently fast, even though they may not always be the _fastest_. – hosford42 Jul 21 '15 at 19:00
  • 1
    for future viewers: a more convenient way of getting a list of the bits comprising an integer (i.e. what `int_to_bits` does) could be something like `list(bin(i)[2:].zfill(size))` – zeawoas Nov 04 '19 at 10:16

2 Answers2

13

As far as I can tell, the built-in Python 3 int is the only one of the options you tested that computes the & in chunks of more than one byte at a time. (I haven't fully figured out what everything in the NumPy source for this operation does, but it doesn't look like it has an optimization to compute this in chunks bigger than the dtype.)

  • bitarray goes byte-by-byte,
  • the bool and 1-bit-per-int NumPy attempts go bit by bit,
  • the packed NumPy attempt goes byte-by-byte, and
  • the bitstring source goes byte-by-byte, as well as doing some things that screw up its attempts to gain speed through Cython, making it by far the slowest.

In contrast, the int operation goes by either 15-bit or 30-bit digits, depending on the value of the compile-time parameter PYLONG_BITS_IN_DIGIT. I don't know which setting is the default.

You can speed up the NumPy attempt by using a packed representation and a larger dtype. It looks like on my machine, a 32-bit dtype works fastest, beating Python ints; I don't know what it's like on your setup. Testing with 10240-bit values in each format, I get

>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160, dtype=num
py.uint64)')
1.3918750826524047
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*8, dtype=n
umpy.uint8)')
1.9460716604953632
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*2, dtype=n
umpy.uint32)')
1.1728465435917315
>>> timeit.timeit('a & b', 'a = b = 2**10240-1')
1.5999407862400403
user2357112
  • 260,549
  • 28
  • 431
  • 505
  • Is there a function similar to packbits that allows me to convert sequences of bits to arrays of uint64s quickly/easily? I am new to numpy. Can I reshape the array that packbits returns somehow? – hosford42 Jun 06 '15 at 06:36
  • @hosford42: `packbits`, then something that converts the result to uint64. That might be something like copying it into an array of dtype uint8 and length a multiple of 8, then using a possibly platform-dependent call to [`view`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.view.html). I'm not sure what the best way to go about it would be. – user2357112 Jun 06 '15 at 06:44
  • @user2357112 As long as the original # of bits is a multiple of 64, using `a = numpy.packbits(bits); v = a.view(np.uint64)` works. However, when I time it, it still comes out about 3 times slower than the built-in Python int/long type. I used `numpy.packbits(%r).view(numpy.uint64)` to convert each bit sequence, with no other changes to the original code. – hosford42 Jun 06 '15 at 07:22
  • @hosford42: Maybe it's a version thing. When I try it on my laptop, NumPy wins. When I try it on ideone or pythonanywhere's "Try IPython" page, NumPy loses. My laptop is on NumPy 1.9.2, while ideone and pythonanywhere's "Try IPython" page are on 1.8.2 and 1.8.1. It might also have something to do with what libraries NumPy was linked with. – user2357112 Jun 06 '15 at 08:23
0

What you are trying to test - are these vector operations at all? You are simply trying to compare speeds of 1 operation and there plain python is going to win 'cos it doesn't have to setup numpy arrays or bitarrays.

How about trying out following?

x = np.array([random.randrange(2**31)]*1000) 
y = np.array([random.randrange(2**31)]*1000) 

%timeit x & y # in ipython

%timeit [ a & b for (a,b) in zip(x,y)] # even though x and y are numpy arrays, we are iterating over them - and not doing any vector operations

Interestingly, if

xxx = [random.randrange(2**31)] * 1000
yyy = [random.randrange(2**31)] * 1000 

and then

%timeit [a & b for (a,b) in zip(xxx,yyy)]

pure python lists, iterating over them is faster than iterating over numpy arrays.. a bit counter intuitive. Not sure why.

Similarly you can try for bitstrings and bitarrays

Is this what you are looking at?

gabhijit
  • 3,345
  • 2
  • 23
  • 36
  • The `timeit()` function counts only the *stmt*, not the *setup*. By the way, the OP's size is 10000, not 1000. – chrisaycock Jun 06 '15 at 05:35
  • That doesn't matter still - you are comparing - bitwise and of a 'single integer' in all cases. Also - running the same operation 10000 times is not same as running the operation on Vector of 10000 elements. (btw that 10000 is common for both pure python and numpy arrays) There's no 'vector' operation there. Also - & for simple integers might still be optimized than single element numpy arrays (need to look that up). – gabhijit Jun 06 '15 at 05:39
  • @gabhijit Please check out the additional information I added to the question. An N-bit integer value contains the same information as an N-bit array of bools. I don't care about the representation; I care about the speed at which I can perform bit-wise operation on N bits, in whatever form. That's why I'm comparing single ints to bool arrays. – hosford42 Jun 06 '15 at 06:44