Problem:
I am using Micheal Nielsen's python 2.7 code to learn machine learning
(his github) and I am trying to transfer his file network.py
and mnist_loader.py
from python 2.7 to python 3.7.
Basically, All I did was just change xrange
to range
, making zip(...)
return a list in the file mnist_loader.py
and putting parentheses to print
function.
And then, I run the following script for python 3.7.3:
import os
cwd = "/home/mhyip/Dropbox/Play/MachineLearning/py37/src"
os.chdir(cwd)
import mnist_loader
import network
import time
start = time.time()
training_data , validation_data , test_data = mnist_loader.load_data_wrapper ()
net = network.Network ([784 , 30, 10])
# Initial the test.
net.SGD( training_data , 3, 10, 3.0 , test_data = test_data )
end = time.time()
print("Time elapsed: ", end-start)
It took at least three times more time as the authors python2.7 code. When I was using python2.7 to run the authors code. It took just 13 seconds and only 4 CPUs to finish. It gives the correct result and the PC is not slow at all.
However, when I ran my small modified code for python3.7. It works, but it took 8 CPUs and 51 seconds to finish. It does give me the correct result but it is much slower than python2.7 and become very slow while it running.
I am wondering why. I suspect the numpy
function np.dot
is all the resource. But why it is slower?
Background:
OS: Ubuntu 18.04.2 LTS x86_64
Kernel: 4.18.0-25-generic
For Python 3.7.3:
numpy: 1.16.4
scikit-learn: 0.21.2
scipy: 1.3.0
For Python 2.7.16
numpy: 1.16.2
scikit-learn: 0.20.3
scipy: 1.2.1
I use conda as the package manager.
Code
I want to post the code of these two files network.py
and mnist_loader.py
modified for python 3.7, but I am not so sure that I have the right to do so. His code is MIT-license on GitHub. Let me know if I can do so. I will do it immediately.
Expected result
I want python3.7 run as fast as python2.7 for Machine Learning in my case.
Thank you and you are awesome.
Update 1:
It turns out on his git, there is a code for python3. However, it suffers the same issue as I have modified the code from the original. I did the cprofile
in both cases. I have show the result in the following figure. The left side is python 3.7 and the right side is python 2.7. It seems like numpy.dot work much better in python 2.7 than python 3.7. Is this true? If it is true, I am very curious why.
If you want to try it by you self. you can go to his github. It has both py3.x and py2.7.
Update 2:
I did one of the commenter's suggestion. I used lld
to check what library the numpy using in python 2.7 and python 3.7. And there is the result. The left one is python 2.7 and the right on is python 3.7. It seems like numpy in python 2.7 use mkl, and numpy in python 3.7 use openblas. Is there any way I can get python 3.7 runs as fast in python 3.7?