I have measured the performance of a parallel read_pickle() execution on a Linux machine with 12 cores and Python 3.6 interpreter (code launched in JupyterLab). I simply open many pickled dataframes:
import pandas as pd
def my_read(filename):
df = pd.read_pickle(path + filename)
print(filename, df.shape)
return df.iloc[:1, :]
files = ... # array of file names of about 130 pickled 1 000 000 x 43 dataframes
Since this is an IO-bound operation rather than a CPU-bound one, I would expect the threaded solution to win over the process-based one.
However, this cell:
%%time
from multiprocessing import Pool
with Pool(10) as pool:
pool.map(my_read, files)
gave
CPU times: user 416 ms, sys: 267 ms, total: 683 ms
Wall time: 3min 37s
while this one:
from multiprocessing.pool import ThreadPool
with ThreadPool(10) as tpool:
tpool.map(my_read, files)
run in
user 7min 28s, sys: 1min 58s, total: 9min 27s
Wall time: 10min 25s
Why?