Proving the Necessity of Synchronization Primitives in Python

Question

Now I am preparing a report on the topic of synchronization primitives in threads and I am trying to find a good example when one result is obtained with the Lock() blocking, and completely different without using it.

In the example below, I'm trying to increment a number by 1 in a loop on multiple threads at once. I have already brought the number of iterations to 1000000 and the number of threads to 1000, but the effect of race conditions (or whatever else) does not want to occur. The result is still strictly equal to the product of the number of iterations and the number of threads (running on Ubuntu-20.04)

from threading import Thread

COUNT = 1000000
NUM_THREADS = 1000
counter = 0


def increment():
    global counter
    for _ in range(COUNT):
        counter += 1


threads = [Thread(target=increment) for _ in range(NUM_THREADS)]
[thread.start() for thread in threads]
[thread.join() for thread in threads]

diff = counter - COUNT * NUM_THREADS
print(f"Diff for counter without synchronization: {diff}")

Can anyone suggest an example (preferably not very complex) where the result of multiple threads computations without applying synchronization primitives would be different from its "synchronized counterpart"?

Just to make sure, when you say " increment a number by 1 in a loop on multiple threads at once", you understand that it's not in paralel, right ? — Itération 122442, May 02 '23 at 20:01
Probably, I did not express myself correctly when describing my code. This means that the variable is accessed concurrently in several threads, and this operation, in theory, requires synchronization. — ibarbylev, May 02 '23 at 20:13

JonSG · Answer 1 · 2023-05-02T20:35:53.010

If you add in some simulated work, you should see some interesting results.

import threading
import time
import random

NUM_THREADS = 5
COUNT = 5

counter_nolock = 0
counter_lock = 0
lock = threading.Lock()

def increment_nolock():
    global counter_nolock
    for _ in range(COUNT):
        prior = counter_nolock + 1
        time.sleep(random.random())
        counter_nolock = prior
        print(f"nolock : {counter_nolock}")

def increment_lock():
    global counter_lock
    for _ in range(COUNT):
        with lock:
            prior = counter_lock + 1
            time.sleep(random.random())
            counter_lock = prior
            print(f"lock : {counter_lock}")

def increment():
    increment_nolock()
    increment_lock()

if __name__ == '__main__':
    threads = [
        threading.Thread(target=increment)
        for _ in range(NUM_THREADS)
    ]

    for thread in threads:
        thread.start()

    for thread in threads:
        thread.join()

    print(f"increment_nolock: expected {COUNT * NUM_THREADS} got: {counter_nolock}")
    print(f"increment_lock: expected {COUNT * NUM_THREADS} got: {counter_lock}")

That should give you a semi-random result something like:

nolock : 1
nolock : 1
nolock : 2
nolock : 1
nolock : 1
nolock : 2
nolock : 3
nolock : 4
nolock : 1
nolock : 3
nolock : 3
nolock : 2
nolock : 4
nolock : 5
nolock : 5
nolock : 2
nolock : 4
nolock : 6
nolock : 3
lock : 1
nolock : 5
lock : 2
nolock : 3
lock : 3
nolock : 4
nolock : 4
nolock : 5
nolock : 6
lock : 4
lock : 5
lock : 6
lock : 7
lock : 8
lock : 9
lock : 10
lock : 11
lock : 12
lock : 13
lock : 14
lock : 15
lock : 16
lock : 17
lock : 18
lock : 19
lock : 20
lock : 21
lock : 22
lock : 23
lock : 24
lock : 25
increment_nolock: expected 25 got: 6
increment_lock: expected 25 got: 25

The solution is very interesting. Thanks) But if I were an advocate for unsynchronized threads, I would probably say: “Your Honor, the random delay between the increment of the value (variable prior) and the direct change of the variable counter_nolock seems a little farfetched." And I would ask to give another example, when the increment occurs in one step and without any dela — ibarbylev, May 02 '23 at 20:59
I guess I would counter with the question about do you seek an example of why lock is required or are you asking if += is threadsafe? — JonSG, May 03 '23 at 00:39
From theory I know well that operation += is "thread-dangerous". I would just like practical confirmation of this theory. — ibarbylev, May 03 '23 at 17:46

score 1 · Accepted Answer · answered May 02 '23 at 22:33

This is what happens in the increment function:

>>> counter = 0
>>>
>>> def increment():
...     global counter
...     for _ in range(1000):
...         counter += 1
...
>>> import dis
>>> dis.dis(increment)
  3           0 LOAD_GLOBAL              0 (range)
              2 LOAD_CONST               1 (1000)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                12 (to 22)
             10 STORE_FAST               0 (_)

  4          12 LOAD_GLOBAL              1 (counter)
             14 LOAD_CONST               2 (1)
             16 INPLACE_ADD
             18 STORE_GLOBAL             1 (counter)
             20 JUMP_ABSOLUTE            8
        >>   22 LOAD_CONST               0 (None)
             24 RETURN_VALUE

To achieve what you want to see, Python needs to do a thread switch after instruction 12 (LOAD_GLOBAL) and instruction 18 (STORE_GLOBAL) - and of course, the other thread will have to modify counter while it has the GIL.

You can get the frequency of Python's thread switches from sys.getswitchinterval() - on my system it is 5 milliseconds. The chances of hitting the switch interval exactly between the right instructions is not zero, so given enough time it will happen. Decreasing the number of threads and increasing the iterations might improve your chances, ...or increasing the number of instructions between the load/store (i.e. doing more work).

What you are seeing are two of the problems with unsynchronized access, i.e. it will work correctly a lot of the time, and it is difficult to reproduce a problem.

I'm afraid you are right. This effect is extremely difficult to "catch red-handed". This would probably be easier to do if the thread switching interval setting could be changed. But as far as I know, this is not possible. — ibarbylev, May 03 '23 at 17:35
`sys.setswitchinterval(...)` will let you change the interval... — thebjorn, May 04 '23 at 15:06

Proving the Necessity of Synchronization Primitives in Python

2 Answers2