5

With the code below I tried to print a bunch of things in parallel on a jupyter-notebook using a ThreadPoolExecutor. Notice that with the function show(), the output is not what you'd normally expect.

from concurrent.futures import ThreadPoolExecutor
import sys

items = ['A','B','C','D','E','F',
         'G','H','I','J','K','L',
         'M','N','O','P','Q','R',
         'S','T','U','V','W','X','Y','Z']

def show(name):
    print(name, end=' ')

with ThreadPoolExecutor(10) as executor:
    executor.map(show, items)

# This outputs
# AB  C D E F G H I J KLMNOP      QR STU VW    XY Z 

But when I try with sys.stdout.write(), I don't get this behavior.

def show2(name):
    sys.stdout.write(name + ' ')

with ThreadPoolExecutor(10) as executor:
    executor.map(show2, items)

# This gives
# A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

The weird thing is, I tried this both on jupyter notebook and by writing a .py file and running it. But with the latter I don't seem to get this problem. I tried searching but all I got was that print() in python-3.x is thread safe. If it is indeed thread safe, could anyone explain why this is happening?

Semin Park
  • 650
  • 6
  • 20
  • 1
    https://twitter.com/nedbat/status/194452404794691584?s=19 – Daniel Roseman May 27 '18 at 11:58
  • 2
    What happens when you do `print("%s "%(name), end ='')`? It's possible the the `end=` variant is outputting the name and space as *distinct* operations where a context switch could occur between them. A `print` of a single string with the space already appended (and an empty `end`) may alleviate this. – paxdiablo May 27 '18 at 12:56
  • If you are on Python 3.3+, you might want to try `print(..., flush=True')`. –  May 27 '18 at 13:43

1 Answers1

3

Specifying end isn't actually needed to expose this; even just doing print(name) will sometimes result in letters being next to each other:

A
B
C
D
EF
G

H
I

Even flush=True doesn't fix it.

The print function is implemented in CPython here, and is written in C. The interesting bit is this:

for (i = 0; i < nargs; i++) {
        if (i > 0) {
            if (sep == NULL)
                err = PyFile_WriteString(" ", file);
            else
                err = PyFile_WriteObject(sep, file,
                                         Py_PRINT_RAW);
            if (err)
                return NULL;
        }
        err = PyFile_WriteObject(args[i], file, Py_PRINT_RAW);
        if (err)
            return NULL;
    }

    if (end == NULL)
        err = PyFile_WriteString("\n", file);
    else
        err = PyFile_WriteObject(end, file, Py_PRINT_RAW);

You can see that it calls PyFile_WriteObject once for each argument (and for sep, if specified), and then once more for the end argument (PyFile_WriteString is basically just a wrapper around PyFile_WriteObject that takes a const char* rather than a PyObject) – I assume there ends up being an opportunity for a context switch somewhere between these calls.

Each call to PyFile_WriteString is essentially the same as calling (in Python) sys.stdout.write, which would explain why you're not seeing this when doing sys.stdout.write(name + ' '); if you instead did this:

sys.stdout.write(name)
sys.stdout.write(" ")

that's more like what the print function itself is doing, which also explains why doing print(name + " ", end="") works too.

ash
  • 5,139
  • 2
  • 27
  • 39
  • Honestly, I can't tell; maybe the answers to [this question](https://stackoverflow.com/q/3029816/5951320) help? [This answer](https://stackoverflow.com/a/18781792/5951320) seems to suggest that, in general, file I/O isn't thread-safe, which implies that `print` isn't thread-safe either. – ash May 27 '18 at 20:40