Bench mark a Python parallel, why does Ubuntu perform slower than Windows?

Question

I am running a set of parallel computing on Intel(R) Xeon(R) L5640 (6 cores 12 siblings) with following 2 platforms

Ubuntu 18.04, Python 3.7.3, numpy 1.16.4, sklearn 0.21.2.
Windows 7 ultimate, Python 3.7.3, numpy 1.16.4, sklearn 0.21.2.

no any other jobs/tasks occupy the cpu cores.

I bench marked this program and got some stats.

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_openml
import time
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
(trainData, testData, trainLabels, testLabels) = train_test_split(X,
y, test_size=0.1)

start = time.time()
model = KNeighborsClassifier(n_jobs=4)
model.fit(trainData, trainLabels)
predictions = model.predict(testData)
print('n_jobs=4 took {}s'.format(time.time() - start))

it took about 470s both on Ubuntu and Windows, which is reasonable.

and then I ran this

start = time.time()
model = KNeighborsClassifier(n_jobs=6)
model.fit(trainData, trainLabels)
predictions = model.predict(testData)
print('n_jobs=6 took {}s'.format(time.time() - start))

it took about 493s on Ubuntu and 350s on Windows, where the part on windows is reasonable but the part on Ubuntu is NOT.

n_jobs=6 take less time than n_jobs=4 on Windows, which is reasonable, since the code utilizes more cpu cores.

n_jobs=6 take more time than n_jobs=4 on Ubuntu, which is NOT reasonable.

this indicates that the joblib parallel with default backend performs different logic between Ubuntu and Windows.

and then I searched in the doc. but the doc involves "windows" is about 'multiprocessing' backend, which does not apply here, since mine is '0.13.2'.

So, why does Ubuntu perform slower than Windows?

I'm confused. You seem to be saying it does and does not take the same amount of time. What is the difference between the two sets of cases you're describing? Is the second one with `n_jobs=6` (since that's what's in your code)? I shouldn't have to guess! Anyway, I don't know enough about your libraries to guess whey they might not work the same on different OSs, but one possibility is that your Ubuntu system had some other process running on it taking up one or more of the cores. — Blckknght, Sep 11 '19 at 02:57

Bench mark a Python parallel, why does Ubuntu perform slower than Windows?

0 Answers0