2

I'm studying 'Learning concurrency in python' which is written by Elliot Forbes. In this book, explaining 'Lists, by default, are thread safe, but only in the way that we access them. It's important to note that the data represented within this list structure is, in fact, not protected, and if you want to safely modify this data, then you must implement a proper locking mechanism to ensure that multiple threads can't potentially run into race conditions within their execution--this holds true for all thread-safe containers. The append() function is one of the few methods for our list data structure that is atomic, and, as such, thread-safe. Lists, thus, become a very quick and easy structure that we can leverage for temporary, in-memory storage. However, if we were to attempt to modify anything within this list in a concurrent fashion, then it's highly possible that we start to see the side effects most often attributed to race conditions. One prime example of such a side effect is if we, for instance, try to update the second element in our list at the same time as another thread. If one were to read, and subsequently write, at the same time as a competing thread, then we could see issues where the value is altered incorrectly. If you wish to utilize lists within your multithreaded applications, then you can do so by extending the class in a similar fashion to how we've previously extended the set primitive.'

So, i used decorator because this book explained using decorator. But, i couldn't find any difference between using no locked list.insert() and using decorated lock.

in 1) : t1's result is 10000. t2's result is 20000. Final result is 20000. I thought final result should be less than 20000 because of race conditions.

in 2) : same with 1)'s result.

  1. Here is a code using no lock.

'''

from threading import Lock
import threading

list = []

def work():
    global list
    for i in range(10000):
        list.insert(i, i)
    print("{} is completed, list length = {}".format(threading.get_ident(), len(list)))
    print(list)

threads = []

t1 = threading.Thread(target=work)
t1.start()
t2 = threading.Thread(target=work)
t2.start()
threads.append(t1)
threads.append(t2)

for t in threads:
    t.join()
print("final list length = ", len(list))
print(list)

'''

  1. Here is a code of decorated version.

'''

from threading import Lock
import threading

def locked_method(method):
    def new_method(self, *args, **kwargs):
        with self._lock:
            return method(self, *args, **kwargs)
    return new_method

class Locked_list(list):
    def __init__(self, *args, **kwargs):
        self._lock = Lock()
        super(Locked_list, self).__init__(*args, **kwargs)

    @locked_method
    def remove(self, *args, **kwargs):
        return super(Locked_list, self).remove(elem)

    @locked_method
    def insert(self, *args, **kwargs):
        return super(Locked_list, self).insert(index, elem)

list = []

def work():
    global list
    for i in range(10000):
        list.insert(i, i)
    print("{} is completed, list length = {}".format(threading.get_ident(), len(list)))
    print(list)

threads = []

t1 = threading.Thread(target=work)
t1.start()
t2 = threading.Thread(target=work)
t2.start()
threads.append(t1)
threads.append(t2)

for t in threads:
    t.join()
print("final list length = ", len(list))
print(list)

'''

Is my code wrong? or python's list doesn't have any race conditions?

su00
  • 29
  • 5

1 Answers1

0

I'm also reading the book. I'm searching for a way to extend list primitive and make it thread-safe for item assignments like these:

list[0] += 1

This brings me to your answer. Yes! your code is in fact wrong for creating a race condition. Python list internal commands like append(), insert(), index() etc, are already thread-safe hence you can get into a race condition using them in your multithread conditions. What is thread-unsafe are operations that are done in a none atomic way. You can find this out by importing and using the dis module. Below I compare the result of dis.dis() for list[0] += 1 and list.insert(0, 1) to make it clear.

  • you can find desctription on how interpret the result of dis.dis() from this answer.
dis.dis("list.insert(0, 1)")

  1           0 LOAD_NAME                0 (counter)
              2 LOAD_METHOD              1 (insert)
              4 LOAD_CONST               0 (1)
              6 LOAD_CONST               0 (1)
              8 CALL_METHOD              2
             10 RETURN_VALUE

As you can see in the list.insert() case, the primary call that actually make the thing happen was conducted in only one opcode(unit of work that GIL the Global Interpreter Lock acts upon). It means that apart from the loading of the method and variables into the stack that can not go wrong in threaded code, the only thing that 'can' go wrong is already done in a single operation and is thread-safe. this is the reason why your code was not able to create any race condition in the first place.

dis.dis("list[0] += 1")

  1           0 LOAD_NAME                0 (counter)
              2 LOAD_CONST               0 (0)
              4 DUP_TOP_TWO
              6 BINARY_SUBSCR
              8 LOAD_CONST               1 (1)
             10 INPLACE_ADD
             12 ROT_THREE
             14 STORE_SUBSCR
             16 LOAD_CONST               2 (None)
             18 RETURN_VALUE

But with the second example, you can see that the LOAD_CONST opcode is done in a separate transaction from STORE_SUBSCR which is the actual assignment of the list item. Here we can see the opportunity that can cause the problem by GIL deciding to release the interpreter lock and give it to another thread after the BINARY_SUBSCR(which will replace the top of the stack element with list[0]) and return the lock to this thread after list[0] has been altered.

And also, if your gonna test this, here is the sample code that can help you. ٔote that for some reason the race condition can not happen for iterations less than 100000. I think it can be due to the way GIL shifts between threads.

def worker_a():
    global counter
    for _ in range(100000):
        counter[0] = counter[0] + 1

threads = [threading.Thread(target=worker_a) for i in range(10)]
for thread in threads:
    thread.start()
for thread in threads:
    thread.join()
assert counter[0] == 1000000