Why operation on boolean are slower than on other types in numpy?

Question

I would like to use Boolean matrix instead of integer matrix with the numpy module because some of my matrices only contain 0 and 1. So, I'm wondering why don't use a Boolean matrix to accelerate some of the computations. But in fact, operation on Boolean matrix are much longer to execute than on float matrices for instance :

import numpy as np
import time 

RM = np.random.rand(1000,1000)
RM = (RM >= .5 )*1.

start_time = time.time()
R = np.sum(RM)
print("--- %s seconds ---" % (time.time() - start_time))


RM = RM.astype(np.bool)

start_time = time.time()
R = np.sum(RM)
print("--- %s seconds ---" % (time.time() - start_time))

gives that response:

--- 0.0010001659393310547 seconds ---
--- 0.002000093460083008 seconds ---

So Boolean matrix uses twice the time! I'm wondering why this is happening and if it exists a work around?

Update

As mention by some comments, the way I compute time execution is not the best way. Using the method of @Anis, here the new method:

import numpy as np
import timeit

RMint = np.ones((1000,1000), dtype='int64')
RMbool = np.ones((1000,1000), dtype='bool')
RMfloat = np.ones((1000,1000), dtype='float64')

def test():
    global RM
    R = np.sum(RM)

if __name__ == '__main__':
    print("int64")
    RM= RMint
    print(timeit.timeit("test( )", number=1000, setup="from __main__ import test"))
    print("bool")
    RM= RMbool
    print(timeit.timeit("test( )", number=1000, setup="from __main__ import test"))
    print("float64")
    RM=RMfloat
    print(timeit.timeit("test( )", number=1000, setup="from __main__ import test"))

I just get the matrix initialization out from the test function because the building of matrices is not the point here. So with this method, I arrive to the same conclusion:

int64
0.7555235163780709
bool
1.9191522692976613
float64
0.935670545406214

So Boolean operation is quite longer than for int or float. But I don't understand why?

You have to run experiments multiple times (and on large datasets) to draw conclusions... The total difference is only one millisecond. It is possible that for instance the proces was suspended temporarely, etc. Furthermore you should use a tool like `timeit` that calculates the CPU time, not the *wall time*. — Willem Van Onsem, Mar 13 '17 at 09:31
[This](http://stackoverflow.com/a/42753035/7207392) may or may not be related. — Paul Panzer, Mar 13 '17 at 09:36
For summing boolean arrays, you should be using `np.count_nonzero`. [Some timings](http://stackoverflow.com/a/38687313/3293881). — Divakar, Mar 13 '17 at 09:46
Okay, I admit I missed the spot with my previous answer, you are absolutely right. I didn't take into account the overhead caused by the allocation thinking it would be small compared to the operation. Wrong. Anyway, after a quick glimpse at the code, it seems that numpy defines two types for a reduce operation: operand_type (int64, bool,...) here and result_type, which in the case of bool is ```int64``` as shown by the type of the value returned. Therefore I suspect there might be some casting/copy going on when doing reduce operations on bool types. — Anis, Mar 13 '17 at 11:33

Samuele Cornell · Accepted Answer · 2019-01-09T11:36:05.123

From my experience, operations on ndarrays of bools are equal (speed-wise) to operations on ndarrays of uint8 only. Generally if other data-types are used, and the same operation is performed on them, the time employed is higher.

Of course, this also depends on the operation which is performed, for example it may be that numpy.sum forces a cast to int64 for the boolean arrays and thus the time employed is comparable (and even slightly higher) than the one obtained with int64.

In fact summing two boolean arrays using non-mod 2 arithmetic makes little sense and one should pretty much always use when possible modulo 2 arithmetic (AND, OR , XOR et cetera..) which are pretty much the only possible on boolean arrays without forcing a cast to other (more cumbersone) types.

This has to do with the fact that ndarrays of bool and uint8 are basically the same at lower level. In fact they occupy the same amount of memory:

>>>import sys
>>>import numpy as np 

>>>sys.getsizeof(np.array([True], dtype="bool"))
   Out: 97
>>>sys.getsizeof(np.array([1], dtype="uint8"))
   Out: 97
>>>sys.getsizeof(np.array([1], dtype="int64"))
   Out: 104

The fact that uint8 and bool occupy the same amount of memory is due to the fact that the smallest amount of addressable memory on x86 systems (but also on most ARM and other embedded systems) is a 8-bit word (1 byte). See Why is a boolean 1 byte and not 1 bit of size?.

I report thereafter, a benchmark to support my claims (numpy 1.13.3):

>>>import numpy as np 
>>>import timeit

>>>a = np.random.randint(0,2, 10000, dtype="bool")
>>>b = np.random.randint(0,2, 10000, dtype="bool")
>>>%timeit np.add(a,b)
   971 ns ± 3.75 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>a = a.astype("uint8")
>>>b  b.astype("uint8")
>>>%timeit np.add(a,b)
   Out: 928 ns ± 1.21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>>>>>a = a.astype("int64")
>>>b = b.astype("int64")
>>>%timeit np.add(a,b)
   Out: 4.86 µs ± 10.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

As you can see, there is pretty much no difference between uint8 and bool here, while int64 is much slower. Interestingly i found that np.add is faster than np.logical_xor on bools. In fact np.add actually performs element-wise OR instead of elemen-wise XOR (which is modulo 2 addition). When performed on bool arrays the two operations np.add and np.logical_or are the same and np.add returns a bool array.

score 1 · Answer 2 · answered Sep 16 '17 at 14:21

No, the Boolean operation is fast than others. The average profile after 5 runs is listed as below.

dtype     Average
int64   4.66626188
bool    1.243509225
float64 5.220022027

My code is modified from @Anis.

import numpy as np
import timeit

def test(dtype):     
    R = np.sum(np.ones((1000,1000), dtype=dtype))

if __name__ == '__main__':
    dtype = 'int64'
    print(dtype)
    print(timeit.timeit(f"test('{dtype}')", number=1000, setup="from __main__ import test"))

    dtype = 'bool'
    print(dtype)
    print(timeit.timeit(f"test('{dtype}')", number=1000, setup="from __main__ import test"))

    dtype = 'float64'
    print(dtype)
    print(timeit.timeit(f"test('{dtype}')", number=1000, setup="from __main__ import test"))

Modules used are:

Software Version Python 3.4.5 64bit IPython 5.1.0 OS Windows 10 numpy 1.11.3

Why operation on boolean are slower than on other types in numpy?

Update

2 Answers2