4

I'm reading line-by-line from a named pipe which provides lines in a second-by-second rhythm. I was trying the plain simple

for line in file:
    processLine(line)

but the processLine() is never called. (EDIT: It gets called eventually after a lot of lines have been read which takes several minutes.) Investigating with strace showed that the process is indeed performing an finishing a read() system call each second and also as expected receives a complete line each time.

I can just guess that the for line in idiom buffers the input and will call the processLine() later with each input line, probably when the buffer is full or in case the input terminates (which in my case it never will).

Can I explicitly set the buffer used here to something smaller?

Or is there another way to tweak the thing so that each line is also processed in a second-by-second rhythm?

EDIT:

Currently I am using this workaround:

for line in lineByLine(namedPipe):
    …

And this is lineByLine():

def lineByLine(openFile):
    line = ''
    while True:
        char = os.read(openFile.fileno(), 1)
        if not char:
            if line:
                yield line
            break
        line += char
        if line.endswith('\n'):
            yield line
            line = ''

But this ugly workaround is of course no solution.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • not sure, but maybe "with" could help? " with file as f: for line in f: processLine(line)" – Ilja Jul 21 '14 at 11:04
  • @Anonymous: I'm using that idiom already (and have the problem). – Alfe Jul 21 '14 at 11:13
  • 1
    @Alfe Check [python bug report](http://bugs.python.org/issue3907). The advice could be to use `readline`. – Jan Vlcinsky Jul 21 '14 at 11:42
  • Exactly. I found the same advice [here](http://stackoverflow.com/questions/3670323/setting-smaller-buffer-size-for-sys-stdin) ;-) – Alfe Jul 21 '14 at 11:47

2 Answers2

2

As you allude to in your question, the file.next() internally buffers. Usually this behavior is correct and undetectable.

file.readline() does not internally buffer in the same way. Your unwieldy example program creates a generator that allows file.readline() to be used as the iterable in a for loop.

An easier way to create such an iterable is with the two-argument form of iter:

import sys
for line  in iter(namedPipe.readline, ''):
  print line
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • I'm reluctant to accept your solution just because it lacks any explanation, so if you add a sentence or two, I'll gladly accept it. (Yeah I know, the explanation already is there in other answers, so you might feel like plagiarizing when you just rephrase it, but if other people have that issue as well and find my question, the accepted answer should be a complete one.) – Alfe Jul 22 '14 at 08:09
0

I found in some documentation (about the -u option) that the internal buffers of the file iterators cannot be switched off. But one can use readline() in a while True: loop to act line buffered. So:

def lineByLine(openFile):
    while True:
        line = openFile.readline()
        if not line:
            break
        yield line

This now works for me, and I assume that there is nothing else (more elegant) I can do to circumvent this issue. After all this now is kind of a solution :-/

Alfe
  • 56,346
  • 20
  • 107
  • 159