1

I've been exploring the internal implementation of threads in Python this week. It's amazing how everyday I get amazed by how much I didn't know; not knowing what I want to know, that's what makes me itch.

I noticed something strange in a piece of code that I ran under Python 2.7 as a mutlithreaded application. We all know that Python 2.7 switches between threads after 100 virtual instructions by default. Calling a function is one virtual instruction, for example:

>>> from __future__ import print_function
>>> def x(): print('a')
... 
>>> dis.dis(x)
  1           0 LOAD_GLOBAL              0 (print)
              3 LOAD_CONST               1 ('a')
              6 CALL_FUNCTION            1
              9 POP_TOP             
             10 LOAD_CONST               0 (None)
             13 RETURN_VALUE        

As you can see, after loading global print and after loading the constant a the function gets called. Calling a function therefore is atomic as it's done with a single instruction. Hence, in a multithreaded program either the function (print here) runs or the 'running' thread gets interrupted before the function gets the change to run. That is, if a context switch occurs between LOAD_GLOBAL and LOAD_CONST, the instruction CALL_FUNCTIONwon't run.

Keep in mind that in the above code I'm using from __future__ import print_function, I'm really calling a builtin function now not the print statement. Let's take a look at the byte code of function x but this time with the print statement:

>>> def x(): print "a"          # print stmt
... 
>>> dis.dis(x)
  1           0 LOAD_CONST               1 ('a')
              3 PRINT_ITEM          
              4 PRINT_NEWLINE       
              5 LOAD_CONST               0 (None)
              8 RETURN_VALUE 

It's quite possible in this case that a thread context switch may occur between LOAD_CONST and PRINT_ITEM, effectively preventing PRINT_NEWLINE instruction from executing. So if you have a multithreaded program like this (borrowed from Programming Python 4th edition and slightly modified):

def counter(myId, count):
    for i in range(count):
        time.sleep(1)
        print ('[%s] => %s' % (myId, i)) #print (stmt) 2.X 

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6)  # don't quit early so other threads don't die

The output may or may not look like this depending on how threads were switched:

[0] => 0
[3] => 0[1] => 0
[4] => 0
[2] => 0
...many more...

This is all okay with the print statement.

What happens if we change print statement with the builtin print function? Let's see:

from __future__ import print_function
def counter(myId, count):
    for i in range(count):
        time.sleep(1)

        print('[%s] => %s' % (myId, i))  #print builtin (func)

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6) 

If you run this script long enough and multiple times, you'll see something like this:

[4] => 0
[3] => 0[1] => 0
[2] => 0
[0] => 0
...many more...

Given all the above explanation how can this be? print is a function now, how come that it prints the passed-in string but not the new line? The print prints the value of end at the end of the printed string, it's set by default to \n. Essentially, a call to function is atomic, how on planet earth it got interrupted?

Let's blow our minds:

def counter(myId, count):
    for i in range(count):
        time.sleep(1)
        #sys.stdout.write('[%s] => %s\n' % (myId, i))
        print('[%s] => %s\n' % (myId, i), end='')

for i in range(5):
    thread.start_new_thread(counter, (i, 5))

time.sleep(6) 

Now the new line is always printed, no jumbled output anymore:

[1] => 0
[2] => 0
[0] => 0
[4] => 0
...many more...

The Addition of \n to the string now obviously proves that print function is not atomic (even though it's a function) and essentially it just acts as if it's the print statement. dis.dis however informs us incoherently or stupidly that it's a simple function and thus an atomic operation?!

Note: I never rely on the order or timing of threads for applications to work properly. This is just for testing purposes only and frankly for geeks like me.

Ryan Haining
  • 35,360
  • 15
  • 114
  • 174
GIZ
  • 4,409
  • 1
  • 24
  • 43
  • 1
    It **can't possibly** be atomic in the general case, because it's able to handle strings of arbitrary size, which necessarily includes sizes larger than what the OS kernel will consent to process in a single syscall. – Charles Duffy Jul 20 '17 at 21:28
  • https://stackoverflow.com/questions/3029816/how-do-i-get-a-thread-safe-print-in-python-2-6 – Ryan Haining Jul 20 '17 at 21:28
  • 3
    ...which is to say -- just because something is atomic at the Python interpreter's layer doesn't mean it's atomic at the layers below that. [Abstractions leak.](https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-abstractions/) – Charles Duffy Jul 20 '17 at 21:30
  • 4
    "Calling a function therefore is atomic as it's done with a single instruction" - nope, doesn't work like that. Aside from the fact that one bytecode instruction can trigger the evaluation of more bytecode instructions, Python makes no promises about bytecode atomicity. – user2357112 Jul 20 '17 at 21:35
  • 1
    If you write multithreaded programs in Python expecting arbitrary function calls to be atomic, you're going to have a bad time. – user2357112 Jul 20 '17 at 21:37
  • 2
    I think you are confusing the idea of calling the function, i.e., setting up the call and transferring the execution to that block of code, and actually executing the function. Calling the function is likely an atomic operation. Executing it will almost never be. – Mad Physicist Jul 20 '17 at 21:39
  • @user2357112 _"one bytecode instruction can trigger the evaluation of more bytecode instructions"_ `print` is a builtin function how does it possibly generate more bytecodes? – GIZ Jul 21 '17 at 08:19
  • @MadPhysicist _"Calling the function is likely an atomic operation. Executing it will almost never be."_ this idea crossed my mind, however, I quickly eliminated it because `print` is builtin so I didn't expect bytecode instructions. – GIZ Jul 21 '17 at 08:30
  • @direprobs, builtin or not, it calls non trivial code that is in some function or functions somewhere. – Mad Physicist Jul 21 '17 at 13:16
  • @direprobs, what does "more bytecodes" have to do with it? Even if it's calling into C, that code is eventually making syscalls into the OS. If the syscall interface doesn't guarantee atomicity (and it doesn't -- it's perfectly free to perform a partial write and return EINTR), you can't build that guarantee in by layering more things on top. – Charles Duffy Jul 21 '17 at 17:59
  • @CharlesDuffy I'm not aware of the atomicity of syscalls. Python shouldn't really care about that anyway and I totally get what you're saying, what I was saying I hope I'm expressive enough here, somehow a C func got interrupted. Why that concerns me? Well, because Python 2.7 switches between threads after a number of virtual instructions, and of course a C code is not a bunch of Python virtual instructions and I was kinda thinking how did that happen? @ user2357112 said that it's possible that a builtin func make a call to a code written in Python, doesn't surprise me now this is the case. – GIZ Jul 21 '17 at 18:43
  • @CharlesDuffy When I tested `list.sort` for example, I didn't get a partially listed list. `list.sort` is totally in C, it never got interrupted in a multithreaded program, but don't get me wrong that's what I tested so far and I thought other functions are atomic and will behave similarly, that's just one awful, wrong idea :( – GIZ Jul 21 '17 at 18:49
  • 1
    @direprobs, "Python 2.7 switches between threads after a number of virtual instructions" is false. Plain, simply, outright false. You're thinking of when the GIL is released, but that's only pertinent when it's not released for other reasons, such as when potentially-blocking I/O is taking place. And `print` is itself potentially-blocking I/O. It's the kernel that does thread scheduling; the CPython interpreter has no role, except inasmuch as it provides mutexes that can impact which threads are marked ready. – Charles Duffy Jul 21 '17 at 19:24
  • 1
    @direprobs, if the GIL isn't held during I/O (to provide a single lock that spans multiple OS-level-atomic operations), and the kernel can split I/O into multiple syscalls, well, there's your problem (and there's the relevance of syscall semantics). – Charles Duffy Jul 21 '17 at 19:27
  • 1
    @direprobs, ...it also is important to understand if you're ever in a position where you care about atomicity not just across two threads inside the same Python interpreter, but about I/O atomicity between two separate programs writing to the same file (each with the `O_APPEND` flag) at the same time. Your question didn't make it clear that the latter was outside its scope, should that in fact be the case. – Charles Duffy Jul 21 '17 at 19:28

1 Answers1

2

Your question is based on the central premise

Calling a function therefore is atomic as it's done with a single instruction.

which is thoroughly wrong.

First, executing the CALL_FUNCTION opcode can involve executing additional bytecode. The most obvious case of this is when the executed function is written in Python, but even built-in functions can freely call other code that may be written in Python. For example, print calls __str__ and write methods.

Second, Python is free to release the GIL even in the middle of C code. It commonly does this for I/O and other operations that might take a while without needing to perform Python API calls. There are 23 uses of the FILE_BEGIN_ALLOW_THREADS and Py_BEGIN_ALLOW_THREADS macros in the Python 2.7 file object implementation alone, including one in the implementation of file.write, which print relies on.

user2357112
  • 260,549
  • 28
  • 431
  • 505