I would like to use Boolean matrix instead of integer matrix with the numpy module because some of my matrices only contain 0 and 1. So, I'm wondering why don't use a Boolean matrix to accelerate some of the computations. But in fact, operation on Boolean matrix are much longer to execute than on float matrices for instance :
import numpy as np
import time
RM = np.random.rand(1000,1000)
RM = (RM >= .5 )*1.
start_time = time.time()
R = np.sum(RM)
print("--- %s seconds ---" % (time.time() - start_time))
RM = RM.astype(np.bool)
start_time = time.time()
R = np.sum(RM)
print("--- %s seconds ---" % (time.time() - start_time))
gives that response:
--- 0.0010001659393310547 seconds ---
--- 0.002000093460083008 seconds ---
So Boolean matrix uses twice the time! I'm wondering why this is happening and if it exists a work around?
Update
As mention by some comments, the way I compute time execution is not the best way. Using the method of @Anis, here the new method:
import numpy as np
import timeit
RMint = np.ones((1000,1000), dtype='int64')
RMbool = np.ones((1000,1000), dtype='bool')
RMfloat = np.ones((1000,1000), dtype='float64')
def test():
global RM
R = np.sum(RM)
if __name__ == '__main__':
print("int64")
RM= RMint
print(timeit.timeit("test( )", number=1000, setup="from __main__ import test"))
print("bool")
RM= RMbool
print(timeit.timeit("test( )", number=1000, setup="from __main__ import test"))
print("float64")
RM=RMfloat
print(timeit.timeit("test( )", number=1000, setup="from __main__ import test"))
I just get the matrix initialization out from the test function because the building of matrices is not the point here. So with this method, I arrive to the same conclusion:
int64
0.7555235163780709
bool
1.9191522692976613
float64
0.935670545406214
So Boolean operation is quite longer than for int or float. But I don't understand why?