Why is my function faster than python's print function in IDLE?

Question

I wrote this function a while back:

def faster_print(*args, sep=" ", end="\n", file=stdout):
    file.write(sep.join(map(str, args))+end)

and I tested it:

from sys import stdout
from time import perf_counter

def faster_print(*args, sep=" ", end="\n", file=stdout):
    file.write(sep.join(map(str, args))+end)

def time(function, *args, **kwargs):
    start = perf_counter()
    function(*args, **kwargs)
    return perf_counter()-start

def using_normal_print(number):
    for i in range(number):
        print("Hello world.", 5, 5.0, ..., str)

def using_faster_print(number):
    for i in range(number):
        faster_print("Hello world.", 5, 5.0, ..., str)

normal_time = time(using_normal_print, number=100)
faster_time = time(using_faster_print, number=100)

print("Normal print:", normal_time)
print("My print function", faster_time)

It turns out that it is only faster in IDLE and not cmd. I know that IDLE creates its own objects for sys.stdout, sys.stdin and sys.stderr but I don't get why it only slows down python's built in print function. This answer says that the built in print function is written in c. Shouldn't that make it faster as my function needs to be compiled from python bytecode into machine code?

I am using Python 3.7.9 and IDLE version 3.7.9

`datetime.now()` is not necessarily accurate for this purpose; you should use [`time.perf_counter`](https://docs.python.org/3/library/time.html#time.perf_counter) as this *"a clock with the highest available resolution to measure a short duration"*, or rather than rolling your own timing code you could use the [`timeit`](https://docs.python.org/3/library/timeit.html) module. — kaya3, Feb 19 '21 at 23:43
Don't benchmark with `datetime.now()`. You should use `time.process_time` instead. — a_guest, Feb 19 '21 at 23:43
using timeit they are both roughly the same for 1k iterations — Teejay Bruno, Feb 19 '21 at 23:45
If there is a difference, my best guess is that the built-in `print` function flushes the buffer immediately whereas your function doesn't, so all 100 lines get flushed at once. IDLE is particularly slow at printing for some reason, perhaps because it's doing word wrap or clearing older lines from the output window, so flushing all at once means the overhead of that only happens once. — kaya3, Feb 19 '21 at 23:47
@kaya3 Even when I add `file.flush` my function is still way faster. — TheLizzard, Feb 19 '21 at 23:49
@TeejayBruno How are you using `timeit`? I was going to use it but don't forget that it will have to compile the function each time. That is why I decided not to go with `timeit`. — TheLizzard, Feb 19 '21 at 23:50
In that case, my second guess is that IDLE tries to handle anything written to `stdout` immediately, and the built-in `print` function writes each argument and separator independently, whereas your function joins it all together as a string and writes once. — kaya3, Feb 19 '21 at 23:50
Looking at the [source code](https://github.com/python/cpython/blob/bcb094b41f7fe4dd1686c50891d85632fcf0d481/Python/bltinmodule.c#L1829) for the built-in print, it seems like there's a lot of overhead compared to just writing directly to a file stream. The code has to be generic enough to be compiled on multiple platforms for one thing, and it has a lot more features. — Random Davis, Feb 19 '21 at 23:50
@RandomDavis I don't think that is it because when I ran it from cmd it gave me nearly the same time for both functions. That is why I think it comes from IDLE. — TheLizzard, Feb 19 '21 at 23:55
@kay3 I tested your second guess and the results strongly suggest that it is correct. See my answer below. Thank you for posting it. — Terry Jan Reedy, Feb 21 '21 at 02:20

score 3 · Accepted Answer · answered Feb 21 '21 at 02:17

TheLizard, thank you for reporting and amending your experiments. As an IDLE maintainer, am concerned about IDLE speed. I noticed that printing to screen is sometimes much slower than in the Python terminal/console REPL. In the latter, Python executes in the same process as the screen window and screen.write directly writes to the screen buffer. IDLE, on the other hand, executes user code in a separate process. In that process, the replacement sys.stdout send output through a socket to the IDLE GUI process, which then call tkinter text.insert, which calls tcl/tk functions that write to a screen bugger. But until now, I have not properly investigated.

I ran your code in 3.10.0a5 on my Win 10 machine. In the REPL, normal and fast print took .05 seconds. In IDLE, they took about 1.1 and .3 seconds. The factor of 6 (.3/.05) is explained by the overhead above. But the additional factor of about 3.7 (1.1/.3)?

To test kaya3's 2nd hypothesis, I defined s = 'a'*75 and replaced your print arguments by s. In REPL, times are still .05 and .05. In IDLE, they are about .41 and .31. I conclude that there is a little internal print function overhead, but that most of the 3.7 is additional socket-to-screen overhead. When print is writing to a buffer, there is no reason to pre-join little strings, because multiple stdout.writes are essentially doing a join, whether to the screen buffer or a disk buffer.

To further test this, I altered the test to write 3 blocks of 40 lines. The REPL times remain the same. In IDLE, they average about .058 and .05, about as fast as in the REPL.

Conclusion: I should document that if one prints in code written to run regularly in IDLE and one cares about speed, one should pre-assemble everything one wants displayed together into one string first and just print that string. IDLE does this for tracebacks, which is why they display 'all at once'.

Thanks for the detailed answer. So the only way to make IDLE faster is to force cpython into joining the args passed into its `print` function before writing to `stdout`. — TheLizzard, Feb 21 '21 at 10:21
What I suggested is that users who care should join args and possibly lines and call print with a single string argument. I am adding a paragraph about this to the IDLE docs now. I once filled Shell with 500000 40 char lines to see if scrolling still worked OK (yes it did). Printing 1000 lines at once would have been faster ;-). I considered the possibility of replacing built-in `print`, but this would have the problem of exact emulation and making IDLE different even if the emulation were perfect. — Terry Jan Reedy, Feb 21 '21 at 19:28
I don't think this is a good suggestion but what if IDLE took all of the stdout and wrote that to a buffer that is written a few milliseconds later to tkinter. It shouldn't be hard to implement with a `.after` script. Otherwise everyone should just join the print args if they are printing a lot of stuff. — TheLizzard, Feb 21 '21 at 20:11
Part of the reason for running user code is a 2nd process is so that user tkinter code and IDLE's GUI tkinter code do not interfere. So idlelib.run, which sets up the execution process and its connection to the gui process, does not run tkinter/tk code. — Terry Jan Reedy, Feb 21 '21 at 22:51
The connection is handled in a separate thread, so I thought of having that thread sleep while waiting, but I doubt that would not work either, because sleep cannot be interrupted and the Windows' minimum sleep is, I believe at least 16 milleseconds. — Terry Jan Reedy, Feb 21 '21 at 23:03
Never knew that there was a minimum sleep time but according to [this](https://stackoverflow.com/q/9518106/11106801) it is 15.6 which rounds to 16 milliseconds. — TheLizzard, Feb 21 '21 at 23:46

Why is my function faster than python's print function in IDLE?

1 Answers1

Linked