Sorting seems to be slower with 2 threads instead of 1

Question

I'm beginning with threads in Python, and trying to implement a merge sort where, at the beginning, the job gets splitted into 2 threads. I'm using collections.deque, itertools.islice, threading.Thread.

I create two threads at the beginning, they do each half the job normally, then I join them and merge the result. But I noticed that it takes way longer (almost 2 twice as long) with two threads than it does when I do it normally.

How can this be possible ? Here is a link to the code, I can reproduce the main parts here if needed (I posted that question on Code Review SE too, and I'd rather keep this one short)

Is it linked to this question (seems to be a similar problem in C++) ? Thank you very much.

Possible duplicate of [python multi-threading slower than serial?](http://stackoverflow.com/questions/10789042/python-multi-threading-slower-than-serial) — Dean Fenster, Jul 11 '16 at 14:11
Although CPython's GIL limits multithreading, you can still use `multiprocessing`. — Eli Korvigo, Jul 11 '16 at 15:32
@EliKorvigo - assuming at least some shared memory is needed, won't that still be a problem even when multi-processing? — rcgldr, Jul 11 '16 at 16:19
@rcgldr `multiprocessing` has some low-level shared memory primitives that are a pain to use, but GIL is not a problem there. — Eli Korvigo, Jul 11 '16 at 16:21
The C++ problem was due to both threads updating adjacent variables at a high frequency. I've written a bottom up merge sort (C / C++) that splits up an array into 4 parts, using a thread on each part (so 4 threads) to do the merge sort, then 2 threads to merge the quarter parts into half parts, and then one thread to merge the two halves. Depending on array size, it runs about 3 times as fast as a single threaded merge sort. — rcgldr, Jul 11 '16 at 16:22
@EliKorvigo - By problem with multi-processor shared memory, I meant performance problem related to updates of the shared memory, not an implementation problem. — rcgldr, Jul 11 '16 at 16:27
@rcgldr yes, there will be a lot of overhead, hence it's only rational to use multiprocessing (especially when there is shared memory involved) when the input task is relatively big. — Eli Korvigo, Jul 11 '16 at 16:38

erip · Accepted Answer · 2016-07-11T14:32:57.760

1

How can this be possible?

Unlike C++, Python is quite difficult to parallelize because of the GIL.

While collections.deque's append and popleft are thread-safe, this does not guarantee that they'll perform well in a non-serial paradigm.

Is it linked to this question?

No. The GIL is a property of CPython. It is totally disjoint from false sharing.

It takes way longer (almost 2 twice as long) with two threads than it does when I do it normally.

This is because the GIL doesn't support shared memory multithreading. As such, you're essentially running your code serially twice.

edited Jul 11 '16 at 14:32

answered Jul 11 '16 at 14:11

erip

16,374
11
66
121

So you mean I cannot ever make this faster with only multi-threading ? What could I do ? – AdrienW Jul 11 '16 at 14:16
1

@BusyAnt I don't recommend Python for multi-threaded applications. Prefer C++, Java, Scala, or almost anything else. – erip Jul 11 '16 at 14:18
Ok I'll remember that. Could you say what is eating my program's performance here though ? Thread creation ? Switching between threads ? Something else ? I'm not sure I've fully understood here... – AdrienW Jul 11 '16 at 14:21
1

@BusyAnt The GIL doesn't really allow shared memory, so you're basically running the serial code twice. – erip Jul 11 '16 at 14:25
4

Note that you can make applications faster with **multiprocessing** - this avoids the GIL problem. – Dean Fenster Jul 11 '16 at 15:19

Sorting seems to be slower with 2 threads instead of 1

1 Answers1