368

I want to loop over the contents of a text file and do a search and replace on some lines and write the result back to the file. I could first load the whole file in memory and then write it back, but that probably is not the best way to do it.

What is the best way to do this, within the following code?

f = open(file)
for line in f:
    if line.contains('foo'):
        newline = line.replace('foo', 'bar')
        # how to write this newline back to the file
SilentGhost
  • 307,395
  • 66
  • 306
  • 293
pkit
  • 7,993
  • 6
  • 36
  • 36

13 Answers13

302

The shortest way would probably be to use the fileinput module. For example, the following adds line numbers to a file, in-place:

import fileinput

for line in fileinput.input("test.txt", inplace=True):
    print('{} {}'.format(fileinput.filelineno(), line), end='') # for Python 3
    # print "%d: %s" % (fileinput.filelineno(), line), # for Python 2

What happens here is:

  1. The original file is moved to a backup file
  2. The standard output is redirected to the original file within the loop
  3. Thus any print statements write back into the original file

fileinput has more bells and whistles. For example, it can be used to automatically operate on all files in sys.args[1:], without your having to iterate over them explicitly. Starting with Python 3.2 it also provides a convenient context manager for use in a with statement.


While fileinput is great for throwaway scripts, I would be wary of using it in real code because admittedly it's not very readable or familiar. In real (production) code it's worthwhile to spend just a few more lines of code to make the process explicit and thus make the code readable.

There are two options:

  1. The file is not overly large, and you can just read it wholly to memory. Then close the file, reopen it in writing mode and write the modified contents back.
  2. The file is too large to be stored in memory; you can move it over to a temporary file and open that, reading it line by line, writing back into the original file. Note that this requires twice the storage.
Alex Altair
  • 3,246
  • 3
  • 21
  • 37
Eli Bendersky
  • 263,248
  • 89
  • 350
  • 412
  • 17
    I know this only has two lines in it, however I don't think the code is very expressive in itself. Because if you think for a sec, if you didn't know the function, there are very few clues in what is going on. Printing the line number and the line is not the same as writing it ... if you get my gist... – chutsu May 29 '10 at 19:12
  • 3
    i agree. how would one use fileinput to write to the file? – jml Jan 24 '11 at 04:50
  • 14
    This **DOES** write to the file. It redirects stdout to the file. Have a look at the [docs](http://docs.python.org/library/fileinput.html) – brice Aug 24 '11 at 16:17
  • Has anyone done timing / performance testing to compare these solutions? I cannot imagine that this solution is as quick as the solution above (creating a copy and then overwriting the original). – Jamie Czuy Sep 28 '11 at 21:39
  • 33
    The key bit here is the comma at the end of the print statement: it surpresses the print statement adding another newline (as line already has one). It's not very obvious at all, though (which is why Python 3 changed that syntax, luckily enough). – VPeric Oct 21 '11 at 14:24
  • Brice's comment should be mentioned in the answer. – Chris Morris Mar 20 '12 at 16:21
  • 6
    Please notice this does not work when you provide an opening hook to the file, e.g. when you try to read/write UTF-16 encoded files. – bompf Jul 01 '13 at 12:19
  • 2
    @bompf: here's [NamedTemporaryFile-based version that supports an arbitrary character encoding](http://stackoverflow.com/a/17222971/4279). – jfs Jul 02 '13 at 21:35
  • 1
    You have forgotten about closing the object. It'd be better to use fileinput.close() just after the loop. – 0x6B6F77616C74 Apr 30 '15 at 22:11
  • I found out that this actually replaces the word...not the line . – Arindam Roychowdhury Sep 21 '15 at 12:24
  • I think a better implementation for `fileinput` than returning a string and redirecting `stdout` would have been returning a subclass of string with a method that actually writes to the file... I put this code in a file, and although the code itself is really short, I also had to include 4 lines of comments to explain all the magic involved. The accepted answer is probably better because it doesn't require any comments. – ArtOfWarfare Jan 19 '16 at 15:25
  • I feel like just using `line.strip()` is better than suppressing the newline with that sneaky `n` in the Python 2 version or including `, end=''` in the Python 3 version... – ArtOfWarfare Jan 19 '16 at 15:33
  • Here's the [part of the documentation](https://docs.python.org/3/library/fileinput.html#fileinput.FileInput) referring to why the file gets written to: *if the keyword argument `inplace=True` is passed to `fileinput.input()` or to the `FileInput` constructor, the file is moved to a backup file and standard output is directed to the input file* – icc97 Nov 29 '16 at 07:28
  • since *any* print is added, how can it be switched off after changing a file? – EsseTi Jan 09 '20 at 15:23
  • 1
    it should be pointed out that this solution is NOT thread-safe – GChamon Jan 24 '20 at 15:44
  • I don't like it personally, I feel this goes against Python Zen "Explicit is better than implicit". Who might think that `print` actually writes to a file here without searching through the docs? – Alexandr Zarubkin Nov 30 '22 at 10:01
236

I guess something like this should do it. It basically writes the content to a new file and replaces the old file with the new file:

from tempfile import mkstemp
from shutil import move, copymode
from os import fdopen, remove

def replace(file_path, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    with fdopen(fh,'w') as new_file:
        with open(file_path) as old_file:
            for line in old_file:
                new_file.write(line.replace(pattern, subst))
    #Copy the file permissions from the old file to the new file
    copymode(file_path, abs_path)
    #Remove original file
    remove(file_path)
    #Move new file
    move(abs_path, file_path)
Thomas Watnedal
  • 4,903
  • 4
  • 24
  • 23
  • 8
    Just a minor comment: `file` is shadowing predefined class of the same name. – ezdazuzena Jan 24 '13 at 15:24
  • @ezdazuzena That's a good point. I've replaced file with file_path – Thomas Watnedal Jan 24 '13 at 22:38
  • 4
    This code changes the permissions on the original file. How can I keep the original permissions? – nic Jul 18 '13 at 21:35
  • 1
    what's the point of fh, you use it in the close call but I don't see the point of creating a file just to close it... – Wicelo Sep 12 '14 at 06:24
  • 2
    @Wicelo You need to close it to prevent leaking of the file descriptor. Here is a decent explanation: http://www.logilab.org/17873 – Thomas Watnedal Sep 19 '14 at 11:52
  • 1
    Yes I've discovered that `mkstemp()` is returning a 2-tuple and `(fh, abs_path) = fh, abs_path`, I didn't know that when I asked the question. – Wicelo Sep 20 '14 at 03:31
  • 1
    I don't know why but for me, the code does not enter the for line in old_file loop. Does anyone else have the same problem ? – Ram Sep 23 '15 at 05:30
  • It looks like string.replace() is deprecated. See this in the Python docs: https://docs.python.org/2/library/string.html#deprecated-string-functions – markg Jun 19 '17 at 04:26
  • 1
    Copy the file permissions via `shutil.copymode(file_path, abs_path)` before the `remove()` – udondan Jan 16 '20 at 12:41
  • remove(file_path) fails if file_path is readonly, also, on exception, temp file abs_path forever stays in temp dir – rmflow Dec 27 '22 at 11:16
  • If you don't want the implementation to change line endings from other operating systems (`LF`, `CRLF`) then you'll need to open the file in binary mode: `with fdopen(fh,'wb') as new_file:` and write to the new file as follows: `new_file.write(bytes(line.replace(pattern, subst), "UTF-8"))` – dutoitns Feb 23 '23 at 11:27
97

Here's another example that was tested, and will match search & replace patterns:

import fileinput
import sys

def replaceAll(file,searchExp,replaceExp):
    for line in fileinput.input(file, inplace=1):
        if searchExp in line:
            line = line.replace(searchExp,replaceExp)
        sys.stdout.write(line)

Example use:

replaceAll("/fooBar.txt","Hello\sWorld!$","Goodbye\sWorld.")
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Jason
  • 2,691
  • 27
  • 28
  • 26
    The example use provides a regular expression, but neither `searchExp in line` nor `line.replace` are regular expression operations. Surely the example use is wrong. – kojiro Nov 14 '11 at 18:18
  • 1
    Instead of `if searchExp in line: line = line.replace(searchExp, replaceExpr)` you can just write `line = line.replace(searchExp, replaceExpr)`. No exception is generated, the line just remains unchanged. – David Wallace Nov 15 '17 at 16:07
  • Worked perfectly for me as well. I had come across a number of other examples that looked very similar to this, but the trick was the use of the ```sys.stdout.write(line)```. Thanks again! – Sage Jan 16 '18 at 17:23
  • 1
    If I use this, my file gets blank. Any idea? – Javier Lopez Tomas Sep 25 '19 at 16:11
  • I also got a blank file, but only when my implementation crashed at runtime during development. Once I fixed the errors in my code - everything worked fine. – dutoitns Feb 22 '23 at 10:28
  • Another point not related to my preceding comment - I also just added `fileinput.close()` before returning from the function. – dutoitns Feb 22 '23 at 10:29
  • I'm currently on Windows, and the solution did change files containing `LF` line endings to `CRLF`. Not an issue for me, but I just wanted to point it out in case it might be a problem for others. – dutoitns Feb 22 '23 at 13:15
  • As you might end up with a blank file if your system crashes before you execute `sys.stdout.write(line)` I won't suggest this solution for a production use case. – dutoitns Feb 22 '23 at 13:22
70

This should work: (inplace editing)

import fileinput

# Does a list of files, and
# redirects STDOUT to the file in question
for line in fileinput.input(files, inplace = 1): 
      print line.replace("foo", "bar"),
Gringo Suave
  • 29,931
  • 6
  • 88
  • 75
Kinlan
  • 16,315
  • 5
  • 56
  • 88
  • 5
    +1. Also if you receive a RuntimeError: input() already active then call the fileinput.close() – geographika Nov 18 '11 at 09:24
  • 4
    Note that `files` should be a string containing the file name, [not a file object](http://stackoverflow.com/a/18529529/1461850). – Lee Aug 30 '13 at 10:00
  • 11
    print adds a newline that could already be there. to avoid this, add .rstrip() at the end of your replacements – Guillaume Gendre Dec 21 '14 at 14:09
  • Instead use files arg in input(), it could be fileinput.input(inplace=1) and call the script as > python replace.py myfiles*.txt – chespinoza Feb 24 '17 at 17:45
25

Based on the answer by Thomas Watnedal. However, this does not answer the line-to-line part of the original question exactly. The function can still replace on a line-to-line basis

This implementation replaces the file contents without using temporary files, as a consequence file permissions remain unchanged.

Also re.sub instead of replace, allows regex replacement instead of plain text replacement only.

Reading the file as a single string instead of line by line allows for multiline match and replacement.

import re

def replace(file, pattern, subst):
    # Read contents from file as a single string
    file_handle = open(file, 'r')
    file_string = file_handle.read()
    file_handle.close()

    # Use RE package to allow for replacement (also allowing for (multiline) REGEX)
    file_string = (re.sub(pattern, subst, file_string))

    # Write contents to file.
    # Using mode 'w' truncates the file.
    file_handle = open(file, 'w')
    file_handle.write(file_string)
    file_handle.close()
Thijs
  • 259
  • 3
  • 4
  • 2
    You might want to use `rb` and `wb` attributes when opening files as this will preserve original line endings – Nux Jun 01 '16 at 14:35
  • In Python 3, you can't use 'wb' and 'rb' with 're'. It will give the error "TypeError: cannot use a string pattern on a bytes-like object" –  Oct 24 '17 at 13:22
20

As lassevk suggests, write out the new file as you go, here is some example code:

fin = open("a.txt")
fout = open("b.txt", "wt")
for line in fin:
    fout.write( line.replace('foo', 'bar') )
fin.close()
fout.close()
hamishmcn
  • 7,843
  • 10
  • 41
  • 46
17

fileinput is quite straightforward as mentioned on previous answers:

import fileinput

def replace_in_file(file_path, search_text, new_text):
    with fileinput.input(file_path, inplace=True) as file:
        for line in file:
            new_line = line.replace(search_text, new_text)
            print(new_line, end='')

Explanation:

  • fileinput can accept multiple files, but I prefer to close each single file as soon as it is being processed. So placed single file_path in with statement.
  • print statement does not print anything when inplace=True, because STDOUT is being forwarded to the original file.
  • end='' in print statement is to eliminate intermediate blank new lines.

You can used it as follows:

file_path = '/path/to/my/file'
replace_in_file(file_path, 'old-text', 'new-text')
Akif
  • 6,018
  • 3
  • 41
  • 44
  • If the new text has special characters such as Japanese glyphs, the characters don't appear properly. They're written in a form similar to `\xe8`. – Unknow0059 Nov 13 '20 at 11:26
15

A more pythonic way would be to use context managers like the code below:

from tempfile import mkstemp
from shutil import move
from os import remove

def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()
    with open(target_file_path, 'w') as target_file:
        with open(source_file_path, 'r') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)

You can find the full snippet here.

formatkaka
  • 1,278
  • 3
  • 13
  • 27
Kiran
  • 1,067
  • 1
  • 11
  • 25
  • 1
    In Python >=3.1 you could open the [two context managers on the same line](https://stackoverflow.com/a/3024953/1075152). – florisla Feb 06 '18 at 11:05
14

If you're wanting a generic function that replaces any text with some other text, this is likely the best way to go, particularly if you're a fan of regex's:

import re
def replace( filePath, text, subs, flags=0 ):
    with open( filePath, "r+" ) as file:
        fileContents = file.read()
        textPattern = re.compile( re.escape( text ), flags )
        fileContents = textPattern.sub( subs, fileContents )
        file.seek( 0 )
        file.truncate()
        file.write( fileContents )
starryknight64
  • 504
  • 7
  • 15
5

Expanding on @Kiran's answer, which I agree is more succinct and Pythonic, this adds codecs to support the reading and writing of UTF-8:

import codecs 

from tempfile import mkstemp
from shutil import move
from os import remove


def replace(source_file_path, pattern, substring):
    fh, target_file_path = mkstemp()

    with codecs.open(target_file_path, 'w', 'utf-8') as target_file:
        with codecs.open(source_file_path, 'r', 'utf-8') as source_file:
            for line in source_file:
                target_file.write(line.replace(pattern, substring))
    remove(source_file_path)
    move(target_file_path, source_file_path)
igniteflow
  • 8,404
  • 10
  • 38
  • 46
5

Create a new file, copy lines from the old to the new, and do the replacing before you write the lines to the new file.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
4

Using hamishmcn's answer as a template I was able to search for a line in a file that match my regex and replacing it with empty string.

import re 

fin = open("in.txt", 'r') # in file
fout = open("out.txt", 'w') # out file
for line in fin:
    p = re.compile('[-][0-9]*[.][0-9]*[,]|[-][0-9]*[,]') # pattern
    newline = p.sub('',line) # replace matching strings with empty string
    print newline
    fout.write(newline)
fin.close()
fout.close()
Emmanuel
  • 53
  • 1
  • 5
  • 1
    You should compile the regex OUTSIDE the for loop, otherwise is a performance waste – Axel Feb 04 '16 at 17:49
1

if you remove the indent at the like below, it will search and replace in multiple line. See below for example.

def replace(file, pattern, subst):
    #Create temp file
    fh, abs_path = mkstemp()
    print fh, abs_path
    new_file = open(abs_path,'w')
    old_file = open(file)
    for line in old_file:
        new_file.write(line.replace(pattern, subst))
    #close temp file
    new_file.close()
    close(fh)
    old_file.close()
    #Remove original file
    remove(file)
    #Move new file
    move(abs_path, file)
rowanthorpe
  • 403
  • 3
  • 10
loi
  • 21
  • 1
  • The formatting of this Python code doesn't look quite right... (I tried to fix, but wasn't sure what was intended) – Andy Hayden Sep 30 '12 at 18:18