is python capable of running on multiple cores?

Question

Question: Because of python's use of "GIL" is python capable running its separate threads simultaneously?

Info:

After reading this I came away rather uncertain on whether or not python is capable of taking advantage of a multi-core processor. As well done as python is, it feels really weird to think that it would lack such a powerful ability. So feeling uncertain, I decided to ask here. If I write a program that is multi threaded, will it be capable of executing simultaneously on multiple cores?

Related: http://stackoverflow.com/questions/203912/does-python-support-multiprocessor-multicore-programming — wkl, Sep 25 '11 at 01:03
Note that the question "is python capable of running on multiple cores?" is a different question from "is python capable [of] running its separate threads [on one process] simultaneously?" — Zachary Ryan Smith, Mar 01 '18 at 02:28

score 86 · Accepted Answer · edited Feb 21 '23 at 19:01

86

The answer is "Yes, But..."

But cPython cannot when you are using regular threads for concurrency.

You can either use something like multiprocessing, celery or mpi4py to split the parallel work into another process;

Or you can use something like Jython or IronPython to use an alternative interpreter that doesn't have a GIL.

A softer solution is to use libraries that don't run afoul of the GIL for heavy CPU tasks, for instance numpy can do the heavy lifting while not retaining the GIL, so other python threads can proceed. You can also use the ctypes library in this way.

If you are not doing CPU bound work, you can ignore the GIL issue entirely (kind of) since python won't acquire the GIL while it's waiting for IO.

edited Feb 21 '23 at 19:01

Glorfindel

21,988
13
81
109

answered Sep 25 '11 at 01:06

SingleNegationElimination

151,563
33
264
304

Or Stackless Python, I believe. – agf Sep 25 '11 at 01:18
I believe pypy still has the GIL for now, they are experimenting with Software transitional memory but its not done yet. – Jakob Bowyer Sep 25 '11 at 02:26
It seems you're right; although some of the work that would be needed to support a GILless PyPy has been started (in particular, the hybrid garbage collector), the GIL is still present; I've edited the answer to reflect that. – SingleNegationElimination Sep 25 '11 at 02:37
4

consider how almost everyone responded to how my question as if I didn't know what GIL was, I get the feeling that you are the only one to read the entire question... anyway, thanks, all the links to various libraries is quite helpful. – Narcolapser Sep 25 '11 at 05:11
Can you give a couple examples of what "waiting for IO" means practically? Aka what is a common IO bound task? – Jacob Waters Aug 12 '22 at 21:27

score 51 · Answer 2 · answered Sep 25 '11 at 01:06

51

Python threads cannot take advantage of many cores. This is due to an internal implementation detail called the GIL (global interpreter lock) in the C implementation of python (cPython) which is almost certainly what you use.

The workaround is the multiprocessing module http://www.python.org/dev/peps/pep-0371/ which was developed for this purpose.

Documentation: http://docs.python.org/library/multiprocessing.html

(Or use a parallel language.)

answered Sep 25 '11 at 01:06

ninjagecko

88,546
24
137
145

4

+1 for saying that threads are somehow limited, what I experienced so far.. with multiprocessing it works! – math Nov 13 '13 at 15:18
Multiprocessing isn't working for me, I've exhausted the docs along with an alternative with all of it's args: jobutil. Very frustrating – Jamie Nicholl-Shelley Jan 07 '21 at 23:40

score 14 · Answer 3 · answered Sep 25 '11 at 01:04

14

CPython (the classic and prevalent implementation of Python) can't have more than one thread executing Python bytecode at the same time. This means compute-bound programs will only use one core. I/O operations and computing happening inside C extensions (such as numpy) can operate simultaneously.

Other implementation of Python (such as Jython or PyPy) may behave differently, I'm less clear on their details.

The usual recommendation is to use many processes rather than many threads.

answered Sep 25 '11 at 01:04

Ned Batchelder

364,293
75
561
662

Just one question. If multiple threads in a CPython program does only context-switching and not run different threads in different cpu-cores, should we instead opt for async over threading ? – bad programmer Aug 20 '22 at 05:44
@badprogrammer, asyncio does slower IO, than multithreading, but it can handle more connections, so it depends on task specificity. – ruslan_krivoshein Feb 08 '23 at 19:52

score 9 · Answer 4 · answered Jan 24 '21 at 15:07

As stated in prior answers - it depends on the answer to "cpu or i/o bound?",
but also to the answer to "threaded or multi-processing?":

Examples run on Raspberry Pi 3B 1.2GHz 4-core with Python3.7.3
--( With other processes running including htop )

For this test - multiprocessing and threading had similar results for i/o bound,
but multi-processing was more efficient than threading for cpu-bound.

Using threads:

Typical Result:
. Starting 4000 cycles of io-bound threading
. Sequential run time: 39.15 seconds
. 4 threads Parallel run time: 18.19 seconds
. 2 threads Parallel - twice run time: 20.61 seconds

Typical Result:
. Starting 1000000 cycles of cpu-only threading
. Sequential run time: 9.39 seconds
. 4 threads Parallel run time: 10.19 seconds
. 2 threads Parallel twice - run time: 9.58 seconds

Using multiprocessing:

Typical Result:
. Starting 4000 cycles of io-bound processing
. Sequential - run time: 39.74 seconds
. 4 procs Parallel - run time: 17.68 seconds
. 2 procs Parallel twice - run time: 20.68 seconds

Typical Result:
. Starting 1000000 cycles of cpu-only processing
. Sequential run time: 9.24 seconds
. 4 procs Parallel - run time: 2.59 seconds
. 2 procs Parallel twice - run time: 4.76 seconds

compare_io_multiproc.py:
#!/usr/bin/env python3

# Compare single proc vs multiple procs execution for io bound operation

"""
Typical Result:
  Starting 4000 cycles of io-bound processing
  Sequential - run time: 39.74 seconds
  4 procs Parallel - run time: 17.68 seconds
  2 procs Parallel twice - run time: 20.68 seconds
"""
import time
import multiprocessing as mp

# one thousand
cycles = 1 * 1000

def t():
        with open('/dev/urandom', 'rb') as f:
                for x in range(cycles):
                        f.read(4 * 65535)

if __name__ == '__main__':
    print("  Starting {} cycles of io-bound processing".format(cycles*4))
    start_time = time.time()
    t()
    t()
    t()
    t()
    print("  Sequential - run time: %.2f seconds" % (time.time() - start_time))

    # four procs
    start_time = time.time()
    p1 = mp.Process(target=t)
    p2 = mp.Process(target=t)
    p3 = mp.Process(target=t)
    p4 = mp.Process(target=t)
    p1.start()
    p2.start()
    p3.start()
    p4.start()
    p1.join()
    p2.join()
    p3.join()
    p4.join()
    print("  4 procs Parallel - run time: %.2f seconds" % (time.time() - start_time))

    # two procs
    start_time = time.time()
    p1 = mp.Process(target=t)
    p2 = mp.Process(target=t)
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    p3 = mp.Process(target=t)
    p4 = mp.Process(target=t)
    p3.start()
    p4.start()
    p3.join()
    p4.join()
    print("  2 procs Parallel twice - run time: %.2f seconds" % (time.time() - start_time))

compare_cpu_multiproc.py
#!/usr/bin/env python3

# Compare single proc vs multiple procs execution for cpu bound operation

"""
Typical Result:
  Starting 1000000 cycles of cpu-only processing
  Sequential run time: 9.24 seconds
  4 procs Parallel - run time: 2.59 seconds
  2 procs Parallel twice - run time: 4.76 seconds
"""
import time
import multiprocessing as mp

# one million
cycles = 1000 * 1000

def t():
    for x in range(cycles):
        fdivision = cycles / 2.0
        fcomparison = (x > fdivision)
        faddition = fdivision + 1.0
        fsubtract = fdivision - 2.0
        fmultiply = fdivision * 2.0

if __name__ == '__main__':
    print("  Starting {} cycles of cpu-only processing".format(cycles))
    start_time = time.time()
    t()
    t()
    t()
    t()
    print("  Sequential run time: %.2f seconds" % (time.time() - start_time))

    # four procs
    start_time = time.time()
    p1 = mp.Process(target=t)
    p2 = mp.Process(target=t)
    p3 = mp.Process(target=t)
    p4 = mp.Process(target=t)
    p1.start()
    p2.start()
    p3.start()
    p4.start()
    p1.join()
    p2.join()
    p3.join()
    p4.join()
    print("  4 procs Parallel - run time: %.2f seconds" % (time.time() - start_time))

    # two procs
    start_time = time.time()
    p1 = mp.Process(target=t)
    p2 = mp.Process(target=t)
    p1.start()
    p2.start()
    p1.join()
    p2.join()
    p3 = mp.Process(target=t)
    p4 = mp.Process(target=t)
    p3.start()
    p4.start()
    p3.join()
    p4.join()
    print("  2 procs Parallel twice - run time: %.2f seconds" % (time.time() - start_time))

est · Answer 5 · 2016-03-16T05:25:11.560

5

example code taking all 4 cores on my ubuntu 14.04, python 2.7 64 bit.

import time
import threading


def t():
    with open('/dev/urandom') as f:
        for x in xrange(100):
            f.read(4 * 65535)

if __name__ == '__main__':
    start_time = time.time()
    t()
    t()
    t()
    t()
    print "Sequential run time: %.2f seconds" % (time.time() - start_time)

    start_time = time.time()
    t1 = threading.Thread(target=t)
    t2 = threading.Thread(target=t)
    t3 = threading.Thread(target=t)
    t4 = threading.Thread(target=t)
    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t1.join()
    t2.join()
    t3.join()
    t4.join()
    print "Parallel run time: %.2f seconds" % (time.time() - start_time)

result:

$ python 1.py
Sequential run time: 3.69 seconds
Parallel run time: 4.82 seconds

edited Mar 16 '16 at 05:25

answered Mar 16 '16 at 04:50

est

11,429
14
70
118

5

So parralel run time is worse than sequencial :o – sliders_alpha Jun 28 '17 at 15:05
2

Here you lose out on context switching. The real gain for you would be if you are e.g. interacting with a web-service, where most of the time is spent waiting for a response. – chjortlund Nov 21 '17 at 10:56
I think it's just an async. Not a pthread type of thread. I mean, the type of threads from the C pthread library. – Fahim Ferdous May 01 '18 at 07:52
You'll need to use the multiprocessing library to do what you're willing to do – Fahim Ferdous May 01 '18 at 07:53
What exactly is the point, in this answer? You are showing us how bad your code is? – Nike Feb 19 '23 at 16:13

score 3 · Answer 6 · answered Apr 14 '18 at 01:06

I converted the script to Python3 and ran it on my Raspberry Pi 3B+:

import time
import threading

def t():
        with open('/dev/urandom', 'rb') as f:
                for x in range(100):
                        f.read(4 * 65535)

if __name__ == '__main__':
    start_time = time.time()
    t()
    t()
    t()
    t()
    print("Sequential run time: %.2f seconds" % (time.time() - start_time))

    start_time = time.time()
    t1 = threading.Thread(target=t)
    t2 = threading.Thread(target=t)
    t3 = threading.Thread(target=t)
    t4 = threading.Thread(target=t)
    t1.start()
    t2.start()
    t3.start()
    t4.start()
    t1.join()
    t2.join()
    t3.join()
    t4.join()
    print("Parallel run time: %.2f seconds" % (time.time() - start_time))

python3 t.py

Sequential run time: 2.10 seconds
Parallel run time: 1.41 seconds

For me, running parallel was quicker.

so it is running in parallel but would it run on separate cores? — Maciek Woźniak, Sep 29 '21 at 17:10

score 2 · Answer 7 · answered Sep 25 '11 at 01:04

Threads share a process and a process runs on a core, but you can use python's multiprocessing module to call your functions in separate processes and use other cores, or you can use the subprocess module, which can run your code and non-python code too.

is python capable of running on multiple cores?

7 Answers7

Using threads:

Using multiprocessing:

Linked

Related