I always thought python list comprehension doesn't implicitly utilize multiprocessing, and reading questions on stack (e.g. this one) also gave me the same impression. However, here is my little experiment:
import numpy as np
import time
# some arbitrary data
n = 1000
p = 5
X = np.block([[np.eye(p)], [np.zeros((n-p, p))]])
y = np.sum(X, axis=1) + np.random.normal(0, 1, (n, ))
n_loop = 100000
# run linear regression using direct matrix algebra
def in_sample_error_algebra(X, y):
beta_hat = np.linalg.inv(X.transpose()@X)@(X.transpose()@y)
y_hat = X@beta_hat
error = metrics.mean_squared_error(y, y_hat)
return error
start = time.time()
errors = [in_sample_error_algebra(X, y) for _ in range(n_loop)]
print('run time =', round(time.time() - start, 2), 'seconds')
run time = 19.68 seconds
While this code was running, all 6 (physical) cores of my CPU shot up to 100%
What's even more magical is, when I changed from list comprehension to for-loop, the same thing happened. I thought with the .append
, it had to be done sequentially. See below:
start = time.time()
errors = []
for _ in range(n_loop):
errors.append(in_sample_error_algebra(X, y))
print('run time =', round(time.time() - start, 2), 'seconds')
run time = 21.29 seconds
Any theories?
Python 3.7.2, numpy 1.15.4