2

Does the following code read one line for each loop or does it read the entire file into memory first before beginning the iteration?

for line in f:
    print(line)

My intentions are to read a single line from the file.

SSS
  • 2,344
  • 8
  • 22
  • 27
  • what are you looking for actually? Is this your curiousity or you have an intended operation in mind? – ha9u63a7 Mar 18 '15 at 22:28

4 Answers4

4

You cannot be sure. All you can know is that it will return one line at a time. The Python Standard Library documentation says : In order to make a for loop the most efficient way of looping over the lines of a file (a very common operation), the next() method uses a hidden read-ahead buffer. As a consequence of using a read-ahead buffer, combining next() with other file methods (like readline()) does not work right.

My understanding is that the read-ahead buffer loads a full chunk (undetermined size) and looks for end of line in that buffer. But for a small file (few ko), you can be sure that there will be only one physical read. I once tried to put a read after getting first line with next on a small file (about 50 lines) and found the file pointer at end of file.

Of course for a really big file, it will be read physically one chunk at a time, and python memory will use one single line at a time. So it will be far more conservative than readlines(). But afterall, on common systems (Unix-like, Mac OS or Windows) the underlying read system call on a file(*) has no notion of end of line and can only read a (maximum) number of bytes. So there is no way on those systems to physically read up to an end of line, whatever language you use. You can only have utilities that load an internal buffer and then look for the end of line in that buffer. That's what next() method does for a file object in Python.

After your comments, I understand that you try to get only first line. You can do it with :

line = f.next()

But do not try to use any read method after that because as I explained above the file pointer may be far beyond the end of first line.

(*) it would not be the same when reading from a console or a terminal device ...

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • so basically what I'm trying to do is read a file and when I come across binary data which is preceded by a header that specifies the number of bytes of the binary data, I want to skip over these binary data using the f.seek call but now I understand I cannot do this since I don't know where my file pointer is. – SSS Mar 18 '15 at 23:04
  • @SSS: unfortunately Python has not the equivalent of C `fread` that allows binary reads synchronized with `fgets` that gets lines. You will have to implement it by hand reading file with `read` and looking yourself in the buffer for end of line. – Serge Ballesta Mar 18 '15 at 23:08
0

If all you need to do is read a single line, and it's followed by binary data, you will need to open the file in binary mode anyway. It's easy then to emulate what Python does when it reads a line: read into a temporary buffer and search for the linefeed character. I'm assuming the text is in an 8-bit ASCII-compatible encoding. You'll need to choose some reasonable maximum line length for max_line_size or the algorithm gets a lot more complicated.

with open(filename, 'rb') as f:
    buffer = f.read(max_line_size)
    len = buffer.find(b'\n')
    if len < 0:
        raise RuntimeError('Line in file too long')
    line = buffer[:len]
    line = line.decode()
    f.seek(len + 1)
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
-1

It works with one line at a time instead of reading the whole thing into memory at once. That's why it's recommended so often.

Community
  • 1
  • 1
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
  • Are you really sure ? – Serge Ballesta Mar 18 '15 at 22:11
  • I have a text file that contains some binary data in there. The binary data has somewhere in there a byte (0x90) that throws a UnicodeDecodeError. It fails at the for loop, hence I'm leaved to believe that somehow the line "for line in f" is reading the entire file first. – SSS Mar 18 '15 at 22:12
  • Such a loop does not necessarily load the entire file into memory first. See the link that I added to the answer. As far as how much of a file is actually loaded into memory at a time, see @Serge's answer. For very small files, it may look at the whole file and/or look at more than one line. – TigerhawkT3 Mar 18 '15 at 22:29
-3

You can do either that or this:

f = open(' a file');

s = f.readlines(): # Read all lines, no looping

This is mentioned in Python docs. There is also this list(f) that makes you list the lines as items in a list

ha9u63a7
  • 6,233
  • 16
  • 73
  • 108
  • That will *for sure* read the entire file at once. I think the desired outcome is the opposite. – Mark Ransom Mar 18 '15 at 22:27
  • In this case I want to read a single line at a time and not all the lines. – SSS Mar 18 '15 at 22:27
  • @SSS then your initial solution works, and yes it buffers the entire file before reading each line in the loop – ha9u63a7 Mar 18 '15 at 22:29
  • @SSS If you just want to read one line at a time, why do you care if Python reads the entire file before looping through each line or not? Can you clarify what your limitations are rather than poking one by one? – ha9u63a7 Mar 18 '15 at 22:31
  • @ha9u63ar the solution works but not the way I envisioned it. I expected it to read a single line at a time but it seems to be reading all the lines. – SSS Mar 18 '15 at 22:32
  • @SSS you have already got your answer in the question itself. Yes it reads one line at a time in the loop. Also, the documentation link I provided tells you about that. – ha9u63a7 Mar 18 '15 at 22:34