1

I'm using a print statement in a python 2.7 script in which I'm creating instances of data modeling classes. They're fairly large classes which do a good number of calculations in property setters during the init, so it's not the fastest executing script. I use print statements to have some idea of progress, but what's interesting is how they're executing. The code looks something like this:

    from __future__ import absolute_import, division, print_function, unicode_literals

    print('Loading data...', end='\t')
    data = LoadData(data_path)
    first_model = FirstModel(parameters).fit(data)
    print('Done.\nFitting second model...', end='\t')
    # prints 'Done.' and then there's a very long pause...
    # suddenly 'Fitting second model...' prints and the next model initializes almost immediately
   second_model = SecondModel(parameters).fit(data)
   results = second_model.forecast(future_dates)

Why would the statement print('Done.\nFitting second model...', end=\t') first print 'Done.' and then pause for a long period of time? There was one instance when I was running this code, and after the 'Done.' printed I got an error before the rest of the statement printed. The error returned was an error in SecondModel where I tried too access a method as an attribute. What's going on here? How or why is python executing this print statement in such a counterintuitive way? It's as if the interpreter views the new line character as an indication that it should start looking at later parts of the code.

Petergavinkin
  • 150
  • 11
  • Maybe there's a buffer – Peter Wood Apr 10 '17 at 20:25
  • 1
    That looks like a print function, so either you're using Python 3 or you've imported from the future. – Peter Wood Apr 10 '17 at 20:26
  • Edited. I did import from future. I guess I'm curious why it seemed to consistently pause after 'Done' and what's unique about '\n' since it doesn't seem to happen anywhere else? – Petergavinkin Apr 10 '17 at 20:30
  • 3
    Try putting `flush=True` after `end='\t'` – Peter Wood Apr 10 '17 at 20:30
  • 1
    Speculation only, but sounds like the interpreter has (at least) two threads in execution: one for the `print()` function, one for the `FirstModel()` function. You're seeing the effect of the [Global Interpreter Lock](http://www.dabeaz.com/python/UnderstandingGIL.pdf) as the `print()` function is I/O bound, it gets put to sleep while the modelling function has the GIL. In your second case, the modeller throws an exception while `print()` is sleeping. – Matthew Cole Apr 10 '17 at 20:32
  • `stdout` does auto-flush when it sees a `\n`, but normally buffers line output, within reason. Thus the pause: the whole output is sent to the OS, but the buffering mechanics decides to only show a part of it. – 9000 Apr 10 '17 at 20:34
  • 2
    Ah, note the back-ported `print` function in 2.7 doesn't support the `flush` parameter. See [this answer](http://stackoverflow.com/a/35467658/1084416) to [How to flush output of Python print?](http://stackoverflow.com/q/230751/1084416) – Peter Wood Apr 10 '17 at 20:34
  • Try [setting your GIL check interval](https://docs.python.org/2/library/sys.html#sys.setcheckinterval) to something very small. `sys.getcheckinterval()` returns `100` for me. See if yours is longer, and try halving it. Does this stop the odd behavior from `print()`? – Matthew Cole Apr 10 '17 at 20:35
  • You have a typo in one of your lines fyi: print('Done.\nFitting second model...', end=\t') You are missing a ' character. print('Done.\nFitting second model...', end='\t') – Preston Martin Apr 10 '17 at 20:36

1 Answers1

2

By default, print calls are buffered. The buffer is flushed whenever a newline character is encountered (therefore, you see Done\n appear). However, the subsequent text is kept in the buffer until the next event that flushes it (in the absence of some subsequent newline character to print, that'll probably be Python either returning to the command prompt or exiting completely to the shell, depending on how you're running this script). Therefore, your time-consuming call to SecondModel().fit() is occurring between the display of the two lines.

To avoid this, you can flush the buffer manually by calling sys.stdout.flush() immediately after the print. Or, if you were ever to move to Python 3.3 or higher, you would be able to shortcut this by passing the additional argument flush=True into print().

Error messages can interrupt printed output, and vice versa, because by default they are handled by two separate streams: sys.stderr and sys.stdout, respectively. The two streams have separate buffers.

jez
  • 14,867
  • 5
  • 37
  • 64