83

Is it possible to parse a file line by line, and edit a line in-place while going through the lines?

Blankman
  • 259,732
  • 324
  • 769
  • 1,199
  • 1
    It is possible under certain conditions. If the line resulting of the editing of a treated line is shorter or of the same longer as the treated line, it is easy to do. If it isn't the case, it becomes more difficult , though not impossible if the lines that undergo the editing are not too numerous. Do you ask this because you want to treat a big file ? – eyquem Mar 27 '11 at 23:58
  • 1
    >>> f = open('tmp', 'r+') >>> f.readline() '75.14\n' >>> f.readline() '100\n' >>> l = _ >>> f.seek(-l.len(), file.SEEK_CUR) >>> f.seek(-len(l), os.SEEK_CUR) >>> f.write('999\n') >>> f.close() >>> – Bob Mar 28 '11 at 00:02
  • See exemple here (http://stackoverflow.com/questions/5286020/python-string-replace-in-a-file-without-touching-the-file-if-no-substitution-was) – eyquem Mar 28 '11 at 00:06
  • [edit text file using Python](http://stackoverflow.com/q/1582750/4279) – jfs Sep 04 '14 at 09:11
  • 1
    can we do it in bash ? – Shivanshu Oct 23 '20 at 03:57

5 Answers5

61

Is it possible to parse a file line by line, and edit a line in-place while going through the lines?

It can be simulated using a backup file as stdlib's fileinput module does.

Here's an example script that removes lines that do not satisfy some_condition from files given on the command line or stdin:

#!/usr/bin/env python
# grep_some_condition.py
import fileinput

for line in fileinput.input(inplace=True, backup='.bak'):
    if some_condition(line):
        print line, # this goes to the current file

Example:

$ python grep_some_condition.py first_file.txt second_file.txt

On completion first_file.txt and second_file.txt files will contain only lines that satisfy some_condition() predicate.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Methods that don't actually write to the middle of a file are also wise because it's easy to make the modification atomic (that is, the file doesn't end up in a partially-modified state if the program is interrupted). – L33tminion Jun 17 '15 at 18:22
  • 2
    Ah, fileinput has a `files` parameter, https://docs.python.org/3/library/fileinput.html – serv-inc Jul 25 '18 at 06:51
  • 1
    For Python 3.x, use the following `print` line: ```python print(line, end='') ``` – Chien Nguyen Dec 14 '22 at 13:10
32

fileinput module has very ugly API, I find beautiful module for this task - in_place, example for Python 3:

import in_place

with in_place.InPlace('data.txt') as file:
    for line in file:
        line = line.replace('test', 'testZ')
        file.write(line)
    file.close()

main difference from fileinput:

  • Instead of hijacking sys.stdout, a new filehandle is returned for writing.
  • The filehandle supports all of the standard I/O methods, not just readline().

Important Notes:

  1. This solution deletes every line in the file if you don't re-write it with the file.write() line.
  2. Also, if the process is interrupted, you lose any line in the file that has not already been re-written.
bobobobo
  • 64,917
  • 62
  • 258
  • 363
Alexey Shrub
  • 1,216
  • 13
  • 22
9

No. You cannot safely write to a file you are also reading, as any changes you make to the file could overwrite content you have not read yet. To do it safely you'd have to read the file into a buffer, updating any lines as required, and then re-write the file.

If you're replacing byte-for-byte the content in the file (i.e. if the text you are replacing is the same length as the new string you are replacing it with), then you can get away with it, but it's a hornets nest, so I'd save yourself the hassle and just read the full file, replace content in memory (or via a temporary file), and write it out again.

Karl Nicoll
  • 16,090
  • 3
  • 51
  • 65
6

If you only intend to perform localized changes that do not change the length of the part of the file that is modified (e.g. changing all characters to lower case), then you can actually overwrite the old contents of the file dynamically.

To do that, you can use random file access with the seek() method of a file object.

Alternatively, you may be able to use an mmap object to treat the whole file as a mutable string. Keep in mind that mmap objects may impose a maximum file-size limit in the 2-4 GB range on a 32-bit CPU, depending on your operating system and its configuration.

thkala
  • 84,049
  • 23
  • 157
  • 201
0

You have to back up by the size of the line in characters. Assuming you used readline, then you can get the length of the line and back up using:

file.seek(offset[, whence])

Set whence to SEEK_CUR, set offset to -length.

See Python Docs or look at the manpage for seek.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Bob
  • 573
  • 4
  • 9