Python 2.7 with numpy needs only halve of the threads than Python 3.7 and run 5 time faster

Question

Problem:

I am using Micheal Nielsen's python 2.7 code to learn machine learning (his github) and I am trying to transfer his file network.py and mnist_loader.py from python 2.7 to python 3.7.

Basically, All I did was just change xrange to range, making zip(...) return a list in the file mnist_loader.py and putting parentheses to print function.

And then, I run the following script for python 3.7.3:

import os

cwd = "/home/mhyip/Dropbox/Play/MachineLearning/py37/src"
os.chdir(cwd)

import mnist_loader
import network
import time

start = time.time()
training_data , validation_data , test_data = mnist_loader.load_data_wrapper ()
net = network.Network ([784 , 30, 10])

# Initial the test.
net.SGD( training_data , 3, 10, 3.0 , test_data = test_data )
end = time.time()

print("Time elapsed: ", end-start)

It took at least three times more time as the authors python2.7 code. When I was using python2.7 to run the authors code. It took just 13 seconds and only 4 CPUs to finish. It gives the correct result and the PC is not slow at all.

However, when I ran my small modified code for python3.7. It works, but it took 8 CPUs and 51 seconds to finish. It does give me the correct result but it is much slower than python2.7 and become very slow while it running.

I am wondering why. I suspect the numpy function np.dot is all the resource. But why it is slower?

Background:

OS: Ubuntu 18.04.2 LTS x86_64
Kernel: 4.18.0-25-generic

For Python 3.7.3:
numpy: 1.16.4
scikit-learn: 0.21.2
scipy: 1.3.0

For Python 2.7.16
numpy: 1.16.2
scikit-learn: 0.20.3
scipy: 1.2.1

I use conda as the package manager.

Code

I want to post the code of these two files network.py and mnist_loader.py modified for python 3.7, but I am not so sure that I have the right to do so. His code is MIT-license on GitHub. Let me know if I can do so. I will do it immediately.

Expected result

I want python3.7 run as fast as python2.7 for Machine Learning in my case.

Thank you and you are awesome.

Update 1:

It turns out on his git, there is a code for python3. However, it suffers the same issue as I have modified the code from the original. I did the cprofile in both cases. I have show the result in the following figure. The left side is python 3.7 and the right side is python 2.7. It seems like numpy.dot work much better in python 2.7 than python 3.7. Is this true? If it is true, I am very curious why.

If you want to try it by you self. you can go to his github. It has both py3.x and py2.7.

Update 2:

I did one of the commenter's suggestion. I used lld to check what library the numpy using in python 2.7 and python 3.7. And there is the result. The left one is python 2.7 and the right on is python 3.7. It seems like numpy in python 2.7 use mkl, and numpy in python 3.7 use openblas. Is there any way I can get python 3.7 runs as fast in python 3.7?

The *readme* on that GitHub repository has a link to a Python 3 repository. Did You check that out? — wwii, Jul 21 '19 at 13:59
Wow, I did not know this. Let me check it out tonight. Perhaps I can answer my own question. Thank you. — MH Yip, Jul 21 '19 at 14:31
Yes, I have tried the python 3 code. It has the same problem as I do. — MH Yip, Jul 21 '19 at 15:46
If you run both 2.7 and 3.7 with cProfile, are there any obvious bottlenecks between original and you version? — Fnord, Jul 21 '19 at 17:13
Hi, Thank you for the suggestion. I did the cProfile and update my problem. the numpy.dot seems like a problem. But Why does numpy.dot work much better at python 2.7 than python3.7? — MH Yip, Jul 21 '19 at 17:48
You might want to check the BLAS etc configurations, https://stackoverflow.com/questions/37184618/find-out-if-which-blas-library-is-used-by-numpy — hpaulj, Jul 21 '19 at 21:14
Hi, I have updated. It seems like np in py2.7 use mkl and np in py3.7 use Openblas in condaconda folder. Is there any way I can get py3.7 as fast as py2.7 in my case? — MH Yip, Jul 22 '19 at 07:24