I am running a set of parallel computing on Intel(R) Xeon(R) L5640 (6 cores 12 siblings) with following 2 platforms
Ubuntu 18.04, Python 3.7.3, numpy 1.16.4, sklearn 0.21.2.
Windows 7 ultimate, Python 3.7.3, numpy 1.16.4, sklearn 0.21.2.
no any other jobs/tasks occupy the cpu cores.
I bench marked this program and got some stats.
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml
import time
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
(trainData, testData, trainLabels, testLabels) = train_test_split(X,
y, test_size=0.1)
start = time.time()
model = KNeighborsClassifier(n_jobs=4)
model.fit(trainData, trainLabels)
predictions = model.predict(testData)
print('n_jobs=4 took {}s'.format(time.time() - start))
it took about 470s both on Ubuntu and Windows, which is reasonable.
and then I ran this
start = time.time()
model = KNeighborsClassifier(n_jobs=6)
model.fit(trainData, trainLabels)
predictions = model.predict(testData)
print('n_jobs=6 took {}s'.format(time.time() - start))
it took about 493s on Ubuntu and 350s on Windows, where the part on windows is reasonable but the part on Ubuntu is NOT.
n_jobs=6 take less time than n_jobs=4 on Windows, which is reasonable, since the code utilizes more cpu cores.
n_jobs=6 take more time than n_jobs=4 on Ubuntu, which is NOT reasonable.
this indicates that the joblib parallel with default backend performs different logic between Ubuntu and Windows.
and then I searched in the doc. but the doc involves "windows" is about 'multiprocessing' backend, which does not apply here, since mine is '0.13.2'.
So, why does Ubuntu perform slower than Windows?