157

To read some text file, in C or Pascal, I always use the following snippets to read the data until EOF:

while not eof do begin
  readline(a);
  do_something;
end;

Thus, I wonder how can I do this simple and fast in Python?

cs95
  • 379,657
  • 97
  • 704
  • 746
Allen Koo
  • 1,996
  • 3
  • 14
  • 15
  • I chose the other question as canonical over this because the question statement is clearly better: "what is the counterpart in X language to doing Y in Z language" is an inferior way to ask "how do I do (thing that Y does in Z language) in X language". People looking for help in X language *should not have to understand Z language* in order to confirm that they have found the right question, and the title of the question should make it clear what the question is about in a language-agnostic way. – Karl Knechtel Sep 03 '22 at 08:06

8 Answers8

242

Loop over the file to read lines:

with open('somefile') as openfileobject:
    for line in openfileobject:
        do_something()

File objects are iterable and yield lines until EOF. Using the file object as an iterable uses a buffer to ensure performant reads.

You can do the same with the stdin (no need to use raw_input():

import sys

for line in sys.stdin:
    do_something()

To complete the picture, binary reads can be done with:

from functools import partial

with open('somefile', 'rb') as openfileobject:
    for chunk in iter(partial(openfileobject.read, 1024), b''):
        do_something()

where chunk will contain up to 1024 bytes at a time from the file, and iteration stops when openfileobject.read(1024) starts returning empty byte strings.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 8
    Note: The `line` will have a new line character at the end. – ben_joseph Jul 11 '17 at 19:04
  • 1
    Reading lines is a bit dangerous for generic binary files, because maybe you have a 6GiB long line… – LtWorf Oct 15 '17 at 09:40
  • @LtWorf: which is why I show how to read binary files *in chunks* rather than lines. – Martijn Pieters Oct 15 '17 at 10:27
  • I'm reading from a `stdin` from a running process...so it doesn't ever have EOF until I kill the process. But then I reach the "end up to now" and I deadlock. How do I detect this and not deadlock? Like if there are no new lines, stop reading the files (even if there isn't an EOF, which in my case will never exist). – Charlie Parker Feb 24 '19 at 21:02
  • @CharlieParker: if you reached a deadlock, then something is *probably* forgetting to flush a buffer. Without an actual MCVE, it is hard to say anything more than that. – Martijn Pieters Feb 24 '19 at 21:04
  • @MartijnPieters simple. Let's say I am running python's CLI inside python and I am trying to send it code as strings. So I have `p = subprocess.Popen(['python'],stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=subprocess.PIPE,)` and the standard for loop going through its contents. `for line in p.stdout:`. Such code freezes my code and I can't do anything. – Charlie Parker Feb 24 '19 at 21:20
  • but what I want to do is read the `stdout` of the current command/string sent to the process. When that is done just continue to the next command/code I sent the `cli` ad infinitum...but I don't want it to drealock/freeze when it's done reading the output of the current string sent to the running process. Does this make sense? – Charlie Parker Feb 24 '19 at 21:22
  • @CharlieParker: you are blocking the process by not reading from its stdout then, but you'd have to use non-blocking mode. See [Interactive input/output using python](//stackoverflow.com/q/19880190) for another option. – Martijn Pieters Feb 24 '19 at 21:24
  • @CharlieParker: this has otherwise nothing to do with this question or my answer to it. Until you close a pipe there is no EOF. – Martijn Pieters Feb 24 '19 at 21:25
78

You can imitate the C idiom in Python.

To read a buffer up to max_size (>0) number of bytes, you can do this:

with open(filename, 'rb') as f:
    while True:
        buf = f.read(max_size)
        if buf == 0:
            break
        process(buf)

Or, a text file line by line:

# warning -- not idiomatic Python! See below...
with open(filename, 'rb') as f:
    while True:
        line = f.readline()
        if not line:
            break
        process(line)

You need to use while True / break construct since there is no eof test in Python other than the lack of bytes returned from a read.

In C, you might have:

while ((ch != '\n') && (ch != EOF)) {
   // read the next ch and add to a buffer
   // ..
}

However, you cannot have this in Python:

 while (line = f.readline()):
     # syntax error

because assignments are not allowed in expressions in Python (although recent versions of Python can mimic this using assignment expressions, see below).

It is certainly more idiomatic in Python to do this:

# THIS IS IDIOMATIC Python. Do this:
with open('somefile') as f:
    for line in f:
        process(line)

Update: Since Python 3.8 you may also use assignment expressions:

 while line := f.readline():
     process(line)

That works even if the line read is blank and continues until EOF.

rrrrrrrrrrrrrrrr
  • 344
  • 5
  • 16
dawg
  • 98,345
  • 23
  • 131
  • 206
  • 3
    As a C and Perl programmer, your point that **[assignments are not allowed in expressions](http://docs.python.org/2/faq/design.html#why-can-t-i-use-an-assignment-in-an-expression)** was crucial to me. – CODE-REaD May 13 '16 at 20:00
  • 2
    The "while True:" method is also useful when you need to operate on more than one input line per iteration, something that the idiomatic Python doesn't allow (as far as I can tell, anyway). – Donald Smith Mar 13 '17 at 16:25
  • You shouldn't be reading lines if you don't make assumptions on the file. A binary file might have huge lines… – LtWorf Oct 15 '17 at 09:41
  • 1
    It seems there is an advantage to the non-idiomatic `readline()` way: you can do fine-grained error handling, like catching `UnicodeDecodeError`, which you can't do with the idiomatic `for` iteration. – flow2k May 28 '19 at 23:12
  • 1
    Note that as of Python 3 the `.read` example is not correct: `read` returns `None` when a non-blocking buffer has no data to offer at the moment, and doesn't indicate having reached EOF. For that, a return value of `0` is used. I have proposed an edit to this respect. https://docs.python.org/3/library/io.html#io.RawIOBase.read – rrrrrrrrrrrrrrrr Oct 08 '22 at 07:25
19

The Python idiom for opening a file and reading it line-by-line is:

with open('filename') as f:
    for line in f:
        do_something(line)

The file will be automatically closed at the end of the above code (the with construct takes care of that).

Finally, it is worth noting that line will preserve the trailing newline. This can be easily removed using:

line = line.rstrip()
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 1
    +1, also pointing out to the OP that this is *not* the same as the very similar `for line in f.readlines(): ...`, a commonly suggested solution. – jedwards Mar 24 '13 at 14:33
14

You can use below code snippet to read line by line, till end of file

line = obj.readline()
while(line != ''):

    # Do Something

    line = obj.readline()
A R
  • 2,697
  • 3
  • 21
  • 38
  • Often iterating over the lines would distort the structure of the program. For example, in a language parser, you want to read the lines and process them in sequence. You don't want to restructure the top level just so you can loop reading lines and then send them to the parser. – Jonathan Starr Aug 09 '18 at 20:53
13

While there are suggestions above for "doing it the python way", if one wants to really have a logic based on EOF, then I suppose using exception handling is the way to do it --

try:
    line = raw_input()
    ... whatever needs to be done incase of no EOF ...
except EOFError:
    ... whatever needs to be done incase of EOF ...

Example:

$ echo test | python -c "while True: print raw_input()"
test
Traceback (most recent call last):
  File "<string>", line 1, in <module> 
EOFError: EOF when reading a line

Or press Ctrl-Z at a raw_input() prompt (Windows, Ctrl-Z Linux)

TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87
user5472996
  • 131
  • 1
  • 2
3

In addition to @dawg's great answer, the equivalent solution using walrus operator (Python >= 3.8):

with open(filename, 'rb') as f:
    while buf := f.read(max_size):
        process(buf)
Yam Mesicka
  • 6,243
  • 7
  • 45
  • 64
1

You can use the following code snippet. readlines() reads in the whole file at once and splits it by line.

line = obj.readlines()
Aditeya Pandey
  • 661
  • 2
  • 6
  • 13
0

How about this! Make it simple!

for line in open('myfile.txt', 'r'):
    print(line)

No need to waste extra lines. And no need to use with keyword because the file will be automatically closed when there is no reference of the file object.

Ali Sajjad
  • 3,589
  • 1
  • 28
  • 38
  • Not all implementations of Python use reference counting, so `with` should always be used except for short-lived scripts that open only one or two files and then quit. – Karol S Nov 28 '22 at 12:57
  • @KarolS you mean only newer versions of python do reference counting? – Ali Sajjad Nov 28 '22 at 18:21
  • 1
    No, I mean the exact garbage collection method varies from implementation to implementation. CPython uses reference counting, so the file will be closed immediately after the loop, but Pypy, Jython, IronPython and Brython do not use it and the file might stay opened until the termination of the program. – Karol S Nov 29 '22 at 00:25