Why can't I call read() twice on an open file?

Question

For an exercise I'm doing, I'm trying to read the contents of a given file twice using the read() method. Strangely, when I call it the second time, it doesn't seem to return the file content as a string?

Here's the code

f = f.open()

# get the year
match = re.search(r'Popularity in (\d+)', f.read())

if match:
  print match.group(1)

# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', f.read())

if matches:
  # matches is always None

Of course I know that this is not the most efficient or best way, this is not the point here. The point is, why can't I call read() twice? Do I have to reset the file handle? Or close / reopen the file in order to do that?

Where did you get the idea that read would not change the state of the file? What reference or tutorial are you using? — S.Lott, Oct 11 '10 at 12:29
I believe closing and reopening the file should work based on the anwers below. — Anthony, Oct 11 '10 at 12:29
@Shynthriir: Closing and reopening the file is not always a good idea since it may have other effects in the system (temporary files, incron, etc.). — Ignacio Vazquez-Abrams, Oct 11 '10 at 12:32
I just want to state the obvious: You *DID* call read() twice! — , Oct 11 '10 at 13:10
W/R/T/ S.Lott, and from 5 years on: this really needs to be in the python documentation. It isn't obvious that one should assume that reading a file object would change state of anything, especially if one is used to working with immutable data/functional-style programming... — Paul Gowder, Oct 02 '15 at 03:34
This is a special case of [Why can't I iterate twice over the same data?](https://stackoverflow.com/questions/25336726/why-cant-i-iterate-twice-over-the-same-data), although there are some concerns specific to files that are worth mentioning explicitly. — Karl Knechtel, Nov 25 '22 at 08:46
@PaulGowder disagreed - because this is caused by the *fundamental nature of files*, and works the same way in every programming language. **Of course** reading from a file changes the state of the file object - because *how else could it know what was read, and where to start the next read?* **Of course** "read the remainder of the file; then read the remainder of the file" gets an empty result the second time, *for the same reason* that "read a line from the file; then read a ilne from the file" gets a different line each time. Without that, how could you ever iterate over the file? — Karl Knechtel, Nov 25 '22 at 09:22
@KarlKnechtel the issue is that not everyone likely to be using python for a basic task like reading a file is likely to have the same mental model for "the fundamental nature of files." At the level of abstraction people who don't have OSes or c-like behavior in their heads usually operate at, loading some bits into memory doesn't change the state of anything. — Paul Gowder, Dec 31 '22 at 04:48

score 195 · Accepted Answer · edited Jan 14 '22 at 16:32

195

Calling read() reads through the entire file and leaves the read cursor at the end of the file (with nothing more to read). If you are looking to read a certain number of lines at a time you could use readline(), readlines() or iterate through lines with for line in handle:.

To answer your question directly, once a file has been read, with read() you can use seek(0) to return the read cursor to the start of the file (docs are here). If you know the file isn't going to be too large, you can also save the read() output to a variable, using it in your findall expressions.

Ps. Don't forget to close the file after you are done with it.

edited Jan 14 '22 at 16:32

Neuron

5,141
5
38
59

answered Oct 11 '10 at 12:27

Tim

5,732
2
27
35

4

+1, Yes, please read to temporary variable to avoid unnecessary file I/O. It's a false economy that you're saving any memory because you have fewer (explicit) variables. – Nick T Oct 11 '10 at 13:45
3

@NickT: I would expect that a small file being read multiple times gets cached by the OS (at least on Linux/OSX), so no extra file I/O for reading in twice. Large files that don't fit in memory don't get cached, but you don't want to read them into a variable because you'll start swapping. So in case of doubt, always read multiple times. If you know for sure the files are small, do whatever gives the nicest program. – Claude Jun 04 '14 at 13:41
4

Tear down can be automated with [`with`](http://effbot.org/zone/python-with-statement.htm). – Cees Timmerman Jan 19 '16 at 16:47

score 48 · Answer 2 · edited Jul 29 '21 at 06:30

48

As other answers suggested, you should use seek().

I'll just write an example:

>>> a = open('file.txt')
>>> a.read()
#output
>>> a.seek(0)
>>> a.read()
#same output

edited Jul 29 '21 at 06:30

Tomerikoo

18,379
16
47
61

answered Oct 11 '10 at 13:20

Ant

5,151
2
26
43

score 22 · Answer 3 · edited Jan 14 '16 at 15:47

22

Everyone who has answered this question so far is absolutely right - read() moves through the file, so after you've called it, you can't call it again.

What I'll add is that in your particular case, you don't need to seek back to the start or reopen the file, you can just store the text that you've read in a local variable, and use it twice, or as many times as you like, in your program:

f = f.open()
text = f.read() # read the file into a local variable
# get the year
match = re.search(r'Popularity in (\d+)', text)
if match:
  print match.group(1)
# get all the names
matches = re.findall(r'<td>(\d+)</td><td>(\w+)</td><td>(\w+)</td>', text)
if matches:
  # matches will now not always be None

edited Jan 14 '16 at 15:47

Glen Selle

3,966
4
37
59

answered Oct 11 '10 at 12:34

Tom Anderson

46,189
17
92
133

1

+1 Actually this was the proposed solution for this exercise (http://code.google.com/intl/de-DE/edu/languages/google-python-class/exercises/baby-names.html). But somehow I didn't thought of storing the string in a variable. D'oh! – helpermethod Oct 11 '10 at 17:33
1

With Python3, use pathlib. `from pathlib import Path; text = Path(filename).read_text()` Takes care of open, close, etc. – PaulMcG Jun 19 '17 at 12:06

score 15 · Answer 4 · answered Oct 11 '10 at 12:27

15

The read pointer moves to after the last read byte/character. Use the seek() method to rewind the read pointer to the beginning.

answered Oct 11 '10 at 12:27

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

score 3 · Answer 5 · answered Oct 11 '10 at 12:31

Every open file has an associated position.
When you read() you read from that position. For example read(10) reads the first 10 bytes from a newly opened file, then another read(10) reads the next 10 bytes. read() without arguments reads all of the contents of the file, leaving the file position at the end of the file. Next time you call read() there is nothing to read.

You can use seek to move the file position. Or probably better in your case would be to do one read() and keep the result for both searches.

score 1 · Answer 6 · answered Oct 11 '10 at 13:15

1

read() consumes. So, you could reset the file, or seek to the start before re-reading. Or, if it suites your task, you can use read(n) to consume only n bytes.

answered Oct 11 '10 at 13:15

towi

21,587
28
106
187

score -1 · Answer 7 · answered Oct 11 '10 at 13:34

I always find the read method something of a walk down a dark alley. You go down a bit and stop but if you are not counting your steps you are not sure how far along you are. Seek gives the solution by repositioning, the other option is Tell which returns the position along the file. May be the Python file api can combine read and seek into a read_from(position,bytes) to make it simpler - till that happens you should read this page.

Why can't I call read() twice on an open file?

7 Answers7

Linked

Related