0

I'm running a MacBook Pro:

enter image description here

I'm running an installation of python2.7 via Anaconda.

Last login: Wed Nov 11 21:41:33 on ttys002
Matthews-MacBook-Pro:~ matthewdunn$ python
Python 2.7.10 |Anaconda 2.4.0 (x86_64)| (default, Oct 19 2015, 18:31:17) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> 

My programs is in an Jupyter Notebook Viewer and is running on the install of Anaconda confirmed via activity monitor.

enter image description here

When I attempt to train an SVM model from http://scikit-learn.org/stable/ it takes me hours and other classmates 15 mins, and no one is building multiprocessing/threading for their programs.

I don't think my code is the issue, because when I fit the SVM model to 2000 records, it still takes a long time to process.

Questions:

  1. Is there a way to default and installation of python to always default to using all available CPUs or does it always need to be defined in the program?
  2. Is there any changes in the MacBook Pro hardware since I bought my computer that would ensure python consumes all available CPU resources?
mattyd2
  • 158
  • 2
  • 10
  • http://stackoverflow.com/questions/1294382/what-is-a-global-interpreter-lock-gil – myaut Nov 12 '15 at 16:23
  • Python is single-threaded by default, so will only use one core's-worth of processor power. Have you tried running your classmates' code on your machine and vice versa to verify that it is indeed not your code? – jonrsharpe Nov 12 '15 at 16:25
  • 1
    Possible duplicate of [Python threads all executing on a single core](http://stackoverflow.com/questions/4496680/python-threads-all-executing-on-a-single-core) – jonrsharpe Nov 12 '15 at 16:26
  • @jonrsharpe - I'm heading to class today to do this exact thing. I just wanted to get this posted. I think my question about how to default multiprocessing is interesting, and I'm wondering if that is possible... – mattyd2 Nov 12 '15 at 16:30
  • 1
    How would you *"default multiprocessing"*? It's up to the developer to decide when the extra overhead of splitting and merging the computation is worth the potential reduction in overall runtime due to parallelisation. – jonrsharpe Nov 12 '15 at 16:31
  • I have a hunch that the real issue may be the parameters you are passing. Your choice of kernels will greatly affect SVM performance in scikit-learn, as well as parameters gamma and C. – David Maust Nov 16 '15 at 05:37

1 Answers1

0

You already got the answer that python is restricted to one core of your CPU. With a quad-core CPU you end up with 25% overall cpu load.

If you want to use several cores at the same time you need to use modules like parallel python.

Here is a list of libraries that allow parallel processing https://wiki.python.org/moin/ParallelProcessing

MrCyclophil
  • 162
  • 3
  • 13
  • I agree with your answer. There are some scikit-learn packages that include parallel implementations. These will allow you to pass a keyword argument `n_jobs=-1` to use all CPUs, typically in the constructor. – David Maust Nov 16 '15 at 05:35