3

I am using python to make a template updater for html. I read a line and compare it with the template file to see if there are any changes that needs to be updated. Then I want to write any changes (if there are any) back to the same line I just read from.

Reading the file, my file pointer is positioned now on the next line after a readline(). Is there anyway I can write back to the same line without having to open two file handles for reading and writing?

Here is a code snippet of what I want to do:

cLine = fp.readline()
if cLine != templateLine:
   # Here is where I would like to write back to the line I read from
   # in cLine
Edwin
  • 797
  • 2
  • 14
  • 23
  • Effectively, what you will have to do is read in the whole file, make the changes then write it all back out. – jonrsharpe May 11 '14 at 22:02
  • hmm was trying to avoid that, but I guess seems like might not be any other way (at least one that is not hacky) – Edwin May 11 '14 at 22:05
  • @jonrsharpe Well, IMO reading whole file is not needed, it might not be feasible in case of large files. But I think there is no better option than parsing. It is impossible, or at least very hard to do on one file. – luk32 May 11 '14 at 22:06

2 Answers2

7

Updating lines in place in text file - very difficult

Many questions in SO are trying to read the file and update it at once.

While this is technically possible, it is very difficult.

(text) files are not organized on disk by lines, but by bytes.

The problem is, that read number of bytes on old lines is very often different from new one, and this mess up the resulting file.

Update by creating a new file

While it sounds inefficient, it is the most effective way from programming point of view.

Just read from file on one side, write to another file on the other side, close the files and copy the content from newly created over the old one.

Or create the file in memory and finally do the writing over the old one after you close the old one.

Jan Vlcinsky
  • 42,725
  • 12
  • 101
  • 98
  • People sometimes tend to think that files are organized by lines, while they rather are arrays of bytes. That is the underlying problem IMO. – luk32 May 11 '14 at 22:10
  • @luk32: True. For Python questions, that assumption in turn is possibly because the basic Python interface to files *is* organized linewise. – BrenBarn May 11 '14 at 22:12
  • Or rename the source file before opening it and write to a file with the original name. – ChrisGPT was on strike May 12 '14 at 00:26
2

At the OS level the things are a bit different from how it looks from Python - from Python a file looks almost like a list of strings, with each string having arbitrary length, so it seems to be easy to swap a line for something else without affecting the rest of the lines:

l = ["Hello", "world"]
l[0] = "Good bye"

In reality, though, any file is just a stream of bytes, with strings following each other without any "padding". So you can only overwrite the data in-place if the resulting string has exactly the same length as the source string - otherwise it'll simply overwrite the following lines.

If that is the case (your processing guarantees not to change the length of strings), you can "rewind" the file to the start of the line and overwrite the line with new data. The below script converts all lines in file to uppercase in-place:

def eof(f):
    cur_loc = f.tell()
    f.seek(0,2)
    eof_loc = f.tell()
    f.seek(cur_loc, 0)
    if cur_loc >= eof_loc:
        return True
    return False

with open('testfile.txt', 'r+t') as fp:

    while True:
        last_pos = fp.tell()
        line = fp.readline()
        new_line = line.upper()
        fp.seek(last_pos)
        fp.write(new_line)
        print "Read %s, Wrote %s" % (line, new_line)
        if eof(fp):
            break

Somewhat related: Undo a Python file readline() operation so file pointer is back in original state

This approach is only justified when your output lines are guaranteed to have the same length, and when, say, the file you're working with is really huge so you have to modify it in place.

In all other cases it would be much easier and more performant to just build the output in memory and write it back at once. Another option is to write to a temporary file, then delete the original and rename the temporary file so it replaces the original file.

Community
  • 1
  • 1
Sergey
  • 11,892
  • 2
  • 41
  • 52