I tested the code for both numpy.take and slice as follows:
import numpy as np
import time
a = np.random.randn(4000000,500)
b = np.arange(0, len(a))
t1 = time.time()
for i in range(10):
a[b!=2]
t2 = time.time()
print(t2-t1)
t1 = time.time()
for i in range(10):
a.take(b!=2, axis=0)
t2 = time.time()
print(t2-t1)
I checked my CPU and the most of them are idle. Only 1 CPU is used. As a result, the timing is very slow.
65.91494154930115
47.01117730140686
It seems to me that slicing is a parallelizable operation. Why is numpy not parallelizing it? Is it that numpy doens't support parallelizable slice or that I need to use some special function in numpy?