4

I am sorry if this is a repeat question. How do I write a python script to process data as a stream of line? I need to do this because the files that I am processing are huge, and I would rather not read the file into the memory.

I know that you can potentially read one line of the file at a time, but I want something that will process a text stream.

Glen Selle
  • 3,966
  • 4
  • 37
  • 59
Sam
  • 7,922
  • 16
  • 47
  • 62
  • 2
    What's the difference between reading "a stream of lines" and "read one line of the file at a time"? – Adam Batkin Mar 11 '11 at 12:35
  • Well, in the input stream, i dont care where the line comes from. I am not doing the file-handling for the input. When I say read oneline at a time, it means that I know the file, my program is responsible for opening and closing it. – Sam Mar 11 '11 at 12:37

3 Answers3

16

You could just read the data from stdin, as described in this answer. This would look like that in code:

for line in sys.stdin:
    # do suff

If you want to process a file, then just call the script like this (on Unix platforms):

cat file.txt | python script.py

You can of course pipe the output of any other program in there too.

Community
  • 1
  • 1
Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
7

Your case sounds pretty much exactly what the fileinput module was designed for. That way you can do:

python script.py file1.txt file2.txt file3.txt file4.txt

and in script.py

import fileinput
for line in fileinput.input():
    # do stuff here

The added bonus for using fileinput is that you can do roughly the same thing Space_C0wb0y suggested adding a dash as the first parameter:

python script.py - < file.txt

or

cat file.txt | python script.py -

fileinput is mentioned in the answers to the question linked by Space_C0wb0y, I just figured I'd spell out how it can be leveraged.

ig0774
  • 39,669
  • 3
  • 55
  • 57
0
f = open('somefile.txt')
for line in f:
    process(line)

Actually, f can be anything that is iterable, so for example a list of strings or even sys.stdin if you wanted to read from standard input.

Adam Batkin
  • 51,711
  • 9
  • 123
  • 115
  • You are right, that was assumed. But if that is all the program is doing (reading lines and calling `process()` on them) then there is no point in explicitly closing the file – Adam Batkin Mar 11 '11 at 12:49
  • 2
    People who ask questions like this one are usually beginners, and it is therefore prudent to show them only the best of practices, because they do not know better. – Björn Pollex Mar 11 '11 at 12:56