7

I want to test if it's ok to append to list from two threads, but I'm getting messy output:

import threading


class myThread(threading.Thread):
    def __init__(self, name, alist):
        threading.Thread.__init__(self)
        self.alist = alist

    def run(self):
        print "Starting " + self.name
        append_to_list(self.alist, 2)
        print "Exiting " + self.name
        print self.alist


def append_to_list(alist, counter):
    while counter:
        alist.append(alist[-1]+1)
        counter -= 1

alist = [1, 2]
# Create new threads
thread1 = myThread("Thread-1", alist)
thread2 = myThread("Thread-2", alist)

# Start new Threads
thread1.start()
thread2.start()

print "Exiting Main Thread"
print alist

So the output is:

Starting Thread-1
Exiting Thread-1
 Starting Thread-2
 Exiting Main Thread
Exiting Thread-2
[1[1, 2[, 1, 2, 23, , 34, 5, 6, ]4
, 5, , 3, 64, 5, ]6]

Why it's so messy and alist not equal to [1,2,3,4,5,6]?

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
Alexey
  • 1,366
  • 1
  • 13
  • 33

4 Answers4

7

Summary

Why is the output messy?

==> Because a thread may yield part way through executing a print statement

Why is aList not equal to [1, 2, 3, 4, 5, 6]?

==> Because the content of aList may change between reading from it and appending to it.

Output

The output is messy because it is being produced by python2's print statement from within threads, and the print statement is not threadsafe. This means that a thread may yield while print is executing. In the code in the question there multiple threads printing, so one thread may yield while printing, the other thread may start printing and then yield so producing the interleaved output seen by the OP. IO operations such as writing to stdout are very slow in CPU terms, so it's quite likely that the operating system may pause a thread performing IO because thread is waiting on the hardware to do something.

For example, this code:

import threading


def printer():
    for i in range(2):
        print ['foo', 'bar', 'baz']


def main():
    threads = [threading.Thread(target=printer) for x in xrange(2)]
    for t in threads: 
        t.start()
    for t in threads:
        t.join()

produces this interleaved output:

>>> main()
['foo', 'bar'['foo', , 'bar', 'baz']
'baz']
['foo', ['foo', 'bar''bar', 'baz']
, 'baz']

The interleaving behaviour can be prevented by using a lock:

def printer():
    for i in range(2):
        with lock:
            print ['foo', 'bar', 'baz']


def main():
    global lock
    lock = threading.Lock()
    threads = [threading.Thread(target=printer) for x in xrange(2)]
    for t in threads: 
        t.start()
    for t in threads:
        t.join()

>>> main()
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']
['foo', 'bar', 'baz']

The contents of the list

The final content of aList will be [1, 2, 3, 4, 5, 6] if the statement

aList.append(aList[-1] + 1)

is executed atomically, that is without the current thread yielding to another thread which is also reading from and appending to aList.

However this not how threads work. A thread may yield after reading the last element from aList or incrementing the value, so it is quite possible to have a sequence of event like this:

  1. Thread1 reads the value 2 from aList
  2. Thread1 yields
  3. Thread2 reads the value 2 from aList, then appends 3
  4. Thread2 reads the value 3 from aList, then appends 4
  5. Thread2 yields
  6. Thread1 appends 3
  7. Thread1 reads the value 3 from aList, then appends 4

This leaves aList as [1, 2, 3, 4, 3, 4]

As with the print statements, this can be prevented by making threads acquire a lock before executing aList.append(aList[-1] + 1)

(Note that the list.append method is threadsafe in pure python code, so there is no risk that the value being appended could be corrupted.)

Community
  • 1
  • 1
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
3

EDIT: @kroltan got me to thinking some more, and i think your example is in fact more threadsafe then i originally thought. The issue is not in the multiple writer threads in total, it's specifically in this line:

alist.append(alist[-1]+1)

There's no guarantee that the append will happen directly after the alist[-1] completes, other operations may be interleaved.

With a detailed explanation here: http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm

Operations that replace other objects may invoke those other objects’ del method when their reference count reaches zero, and that can affect things. This is especially true for the mass updates to dictionaries and lists. When in doubt, use a mutex!

Original Answer:

This is undefined behavior, as you have multiple threads writing to the same bit of memory - hence the "messy" output your observing.

I want to test if it's ok to append to list from two threads, but I'm getting messy output

I think you've successfully tested this, and the answer is No. Lots of more detailed explanations on SO: https://stackoverflow.com/a/5943027/62032

Community
  • 1
  • 1
tinkertime
  • 2,972
  • 4
  • 30
  • 45
  • 1
    Is it truly undefined? Lists are certainly thread-safe. It is certainly *unpredictable* to concurrently write to stdout, but definedly defined. – This company is turning evil. Jan 02 '17 at 14:32
  • 1
    Lists are thread safe. Your first statement is correct - the list numers won't grow monotonically due to other operations getting in between retrieving )alist[-1] and the call to `append`. But the "messy" output is due tto the print statement: that one is ot thread safe, and the output of both calls to print bis mangled. The resulting ist should still contain only int objects, though. – jsbueno Jan 07 '17 at 12:55
0

Since you are using the same variable to read and write, it would have an undefined behavior, I executed the code and got 2 different outputs on two different instances on the same machine:

Starting Thread-1 
Exiting Thread-1 
[1, 2, 3, 4]Starting Thread-2   

Exiting Main Thread 
 [Exiting Thread-21, 2, 3, 4 
, [51, , 62],
3, 4, 5, 6]

and this

Starting Thread-1
Exiting Thread-1
[1, 2, 3, 4]
Exiting Main Thread
[1, 2, 3, 4]
Starting Thread-2
Exiting Thread-2
[1, 2, 3, 4, 5, 6]

You should use synchronize to get the output as desired or else wait for the undetermined state to get you correct output

EDIT: You can go through this article about how to implement synchronize http://theorangeduck.com/page/synchronized-python

Arghya Saha
  • 5,599
  • 4
  • 26
  • 48
  • Ok, appretiate your answer, but can you please provide some code that implements synchronizing? – Alexey Jan 02 '17 at 14:40
0

you need to use the threading.lock method to ensure that while actions (such as printing output to screen) are performed by one thread they don't interfere with the action of other threads.

Frank
  • 1