6

I want to read and process some large files line by line in python and output results in the terminal. I have gone through How do I read from stdin? and How do I write a unix filter in python?, but I am looking for methods which do not wait till the entire file is read into memory.

I would be using both of these commands:

cat fileName | python myScript1.py
python myScript2.py fileName
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
BiGYaN
  • 6,974
  • 5
  • 30
  • 43
  • There are three separate issues here: 1) deciding whether to use the command line arguments as filenames or to read from standard input (trivial); 2) opening the file (possibly multiple files) names on the command line and setting them up to be read from; 3) doing line by line reading. The first duplicate addresses the first two points; the second addresses reading line by line (of course). But also, the top answers on the "how do you read from stdin" link all describe line-by-line reading from stdin. – Karl Knechtel Feb 06 '23 at 09:46

3 Answers3

9

This is the standard behavior of file objects in Python:

with open("myfile.txt", "r") as myfile:
    for line in myfile:
        # do something with the current line

or

for line in sys.stdin:
    # do something with the current line
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
4

Just iterate over the file:

with open('huge.file') as hf:
  for line in hf:
    if 'important' in line:
      print(line)

This will require O(1) memory.

To read from stdin, simply iterate over sys.stdin instead of hf:

import sys
for line in sys.stdin:
  if 'important' in line:
    print(line)
phihag
  • 278,196
  • 72
  • 453
  • 469
  • I am a python newbie, can you please explain "simply iterate over sys.stdin instead of hf". Do you mean `for line in sys.stdin` ? – BiGYaN Oct 17 '11 at 09:38
  • 1
    Yes, `sys.stdin` is just a [file object](http://docs.python.org/library/sys.html?highlight=stdin#sys.stdin) that behaves like a file you have opened manually. – Tim Pietzcker Oct 17 '11 at 09:42
-1
if __name__ == '__main__':
    while 1:
        try:
            a=raw_input()
        except EOFError:
            break
        print a

This will read from stdin til EOF. To read a file using the second method, you can use Tim's method

i.e.

with open("myfile.txt", "r") as myfile:
    for line in myfile:
        print line
        # do something with the current line
spicavigo
  • 4,116
  • 22
  • 28