346

How do I search and replace text in a file using Python 3?

Here is my code:

import os
import sys
import fileinput

print("Text to search for:")
textToSearch = input("> ")

print("Text to replace it with:")
textToReplace = input("> ")

print("File to perform Search-Replace on:")
fileToSearch = input("> ")

tempFile = open(fileToSearch, 'r+')

for line in fileinput.input(fileToSearch):
    if textToSearch in line:
        print('Match Found')
    else:
        print('Match Not Found!!')
    tempFile.write(line.replace(textToSearch, textToReplace))
tempFile.close()

input('\n\n Press Enter to exit...')

Input file:

hi this is abcd hi this is abcd
This is dummy text file.
This is how search and replace works abcd

When I search and replace 'ram' by 'abcd' in above input file, it work like a charm. But when I do it vice versa, i.e., replacing 'abcd' by 'ram', some junk characters are left at the end.

Replacing 'abcd' by 'ram':

hi this is ram hi this is ram
This is dummy text file.
This is how search and replace works rambcd
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Shriram
  • 4,711
  • 6
  • 20
  • 22

22 Answers22

545

As pointed out by michaelb958, you cannot replace in place with data of a different length because this will put the rest of the sections out of place. I disagree with the other posters suggesting you read from one file and write to another. Instead, I would read the file into memory, fix the data up, and then write it out to the same file in a separate step.

# Read in the file
with open('file.txt', 'r') as file:
  filedata = file.read()

# Replace the target string
filedata = filedata.replace('abcd', 'ram')

# Write the file out again
with open('file.txt', 'w') as file:
  file.write(filedata)

Unless you've got a massive file to work with which is too big to load into memory in one go, or you are concerned about potential data loss if the process is interrupted during the second step in which you write data to the file.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jack Aidley
  • 19,439
  • 7
  • 43
  • 70
  • The file should be closed in the end with `file.close()` – Jonas Stein Apr 16 '16 at 13:31
  • 27
    @JonasStein: No, it shouldn't. The `with` statement automatically closes the file at the end of the statement block. – Jack Aidley Apr 16 '16 at 21:53
  • @JackAidley Do we need to worry about memory consumption using this method? especially if the file is very large – user3167654 Apr 04 '18 at 16:45
  • @user3167654 For a _very_ large file, yes, you do. Also, if you have both a large file and need bomb-proof reliability this is not the right method since it can you leave without either the original file or the modified version if the write-back step is interrupted. However, for most uses I think it is appropriate. – Jack Aidley Apr 04 '18 at 17:17
  • @JackAidley Can we get the count of "string_to_be_replaced" in a file ? Or how many counts it's been replaced ? – StackGuru Apr 14 '20 at 14:54
  • @StackGuru The easiest way would be to do `count_replaced = filedata.count(string_to_be_replaced)` before doing the replace. It means scanning the whole thing twice though, so I guess it's pretty inefficient. – Jack Aidley Apr 14 '20 at 15:13
  • Maybe easier `with open(inpath, 'r') as inf, open(outpath, 'w') as outf: outf.write(inf.read().replace(find, replace))` – debuti May 06 '20 at 10:54
  • 1
    @debuti: That would read from one file and write to another; this reads and writes to the same file. – Jack Aidley May 06 '20 at 11:21
  • It might be dangerous if something wrong happens right before executing `file.write(filedata)` - your old data is lost, new data not written yet. – Martin Grůber May 15 '20 at 15:08
  • 1
    @MartinGrůber: Correct, hence the final paragraph. – Jack Aidley May 15 '20 at 16:01
  • 1
    Allowed to me only write on the file if necessary (which caused an environment reload on in VS) as opposed to the accepted answer. – jeromej Jun 04 '20 at 15:48
  • I wonder why there's no solution for handling huge files here. Every other question has one saying loading the whole file into memory is not a good idea. – Phani Rithvij Sep 01 '20 at 07:10
  • @JackAidley thank-you for this helpful answer. Would you mind explaining why the 'r'/'w' flags are needed. I.e. why it needs `with open('file.txt', 'r') as file:` and `with open('file.txt', 'w') as file:` istead of just `with open('file.txt') as file`? From reading the python documentation (https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) I had expected that latter would be sufficient, but it returns me a `FileNotFoundError`. – Aerinmund Fagelson Apr 25 '22 at 11:52
  • 1
    @AerinmundFagelson: The 'r' specifies that the file is opened for reading, the 'w' that it is opened for writing. If you omit the flag then it defaults to read only, but I prefer to include it for clarity, especially when - as here - I am doing both variants in quick succession. If you are getting `FileNotFoundError` then it is trying to read a file that doesn't (yet?) exist. – Jack Aidley Apr 25 '22 at 12:34
  • does this work as how `sed` works? – alper Aug 31 '23 at 19:29
348

fileinput already supports inplace editing. It redirects stdout to the file in this case:

#!/usr/bin/env python3
import fileinput

with fileinput.FileInput(filename, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(text_to_search, replacement_text), end='')
Jacktose
  • 709
  • 7
  • 21
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • 19
    What is the `end=''` argument supposed to do? – egpbos Apr 01 '14 at 13:40
  • 28
    `line` already has a newline. `end` is a newline by default, `end=''` makes `print()` function do not print additional newline – jfs Apr 01 '14 at 13:46
  • 17
    Don't use fileinput! Consider writing the code to do this yourself instead. Redirecting sys.stdout isn't a great idea, especially if you're doing it without a try..finally like fileinput does. If an exception gets raised, your stdout might never get restored. – craigds Dec 18 '14 at 03:09
  • 9
    @craigds: wrong. `fileinput` is not a tool for *all* jobs (*nothing* is) but there are many cases where it **is** the right tool e.g., to implement a `sed`-like filter in Python. Don't use a screwdriver to pound nails. – jfs Dec 18 '14 at 13:16
  • 7
    If you *really* want to redirect stdout to your file for some reason, it's not hard to do it better than `fileinput` does (basically, use `try..finally` or a contextmanager to ensure you set stdout back to it's original value afterwards). The source code for `fileinput` is pretty eye-bleedingly awful, and it does some really unsafe things under the hood. If it were written today I very much doubt it would have made it into the stdlib. – craigds Dec 18 '14 at 22:06
  • 3
    @craigds: I don't see the benefit of reimplemening the diamond operator everytime I need it. Don't optimize prematurely. And if you don't like the implementation; submit a patch. – jfs Dec 18 '14 at 22:33
  • 2
    there is an alternative solution for the ", end=''". You could add .rstrip() at the end of your replaces to avoid double newlines – Guillaume Gendre Dec 21 '14 at 14:10
  • 5
    @GuillaumeGendre: rstrip() might remove too much e.g., trailing whitespace. `end=""` is a cleaner solution. – jfs Dec 21 '14 at 15:56
  • This solution works great but the only caveat is it rewrites every single line. What I mean is if you run "diff" on old file vs new file, you'll notice that every line appears as modified. This matters a lot if the files are in svn, like in my case. Any workarounds? – Suresh Oct 31 '15 at 19:33
  • @Suresh: it is probably related to the universal newlines mode (if your input file has newlines in non-native format for the system then they are normalized). Create a minimal input that demonstrates the issue e.g., `open('file', 'wb').write(b'\r\n\n\r')`, do search/replace using the code in the answer, and post the unexpected results (if any)(`print(ascii(open('file', 'rb').read()))`) along with the expected results as a new question. – jfs Oct 31 '15 at 20:16
  • Sorry, I didn't see the comment earlier. I figured if I changed Here's how I dealt with it: `for filename in __files__:` \n tmp_name = filename + '.modified'`\n with codecs.open(filename, 'r', encoding='latin-1') as fi, \ codecs.open(tmp_name, 'w', encoding='latin-1') as fo: for line in fi: new_line = line.replace(oldv,newv) fo.write(new_line) os.remove(filename) os.rename(tmp_name, filename) – Suresh Feb 09 '16 at 22:53
  • 1
    @Suresh : 1- comments is not an appropriate place to discuss possible solutions to a new question, ask a new question instead 2- Don't use `codecs.open`, use `io.open` instead. – jfs Feb 10 '16 at 09:45
  • @J.F.Sebastian Hi Sebastian we can also use `sub()` method from `re` module, check out my answer to this question totally works. – R__raki__ Oct 02 '16 at 17:10
  • rstrip() fixes the problems that end="" creates in python 2.7 – answerSeeker Feb 08 '17 at 21:20
  • @answerSeeker: to enable `end=''`, you could use `from __future__ import print_function` or at the very least use `.rstrip('\n')` instead of `.rstrip()`, to avoid removing too much whitespace from the line. – jfs Feb 09 '17 at 00:55
  • good piece of code, here is the 2.7 version ->http://stackoverflow.com/questions/30835090/attributeerror-fileinput-instance-has-no-attribute-exit – Vitaliy Terziev May 10 '17 at 09:35
  • 1
    @Christophe Roussy read the question. Notice the names. Don't make such edits without a comment – jfs Jun 19 '17 at 14:53
  • @J.F.Sebastian ok for question, but then the naming is bad in the question too as 'textToReplace' is the the text to replace in english, this is very confusing for beginners, but I understand why you kept the original – Christophe Roussy Jun 19 '17 at 15:07
  • @jfs I use `with open(os.path.join(path,file), "r", encoding = "utf-8") as file:` to open file and avoid UnicodeDecodeError but in above case of `FileInput(filename, inplace=True, backup='.bak')` how am I suppose to avoid that please comment on that. – Nitish Kumar Pal Apr 18 '18 at 06:15
  • @NitishKumarPal if it is not clear from the documentation, ask a separate Stack Overflow question (how to specify the character encoding for FileInput) – jfs Apr 18 '18 at 06:22
  • What is the use of backup='.bak'? I couldn't find it anywhere in the documentation – Ridhuvarshan Jun 06 '18 at 02:09
  • 3
    @Ridhuvarshan open the fileinput documentation, search for the word "backup" e.g., [follow the link](https://docs.python.org/3/library/fileinput.html) then press Ctrl+f and start typing the word backup. If it fails; ask a separate Stack Overflow question. – jfs Jun 06 '18 at 05:15
  • 2
    Oddly enough, `fileinput` changes the ownership of the file on linux system. e.g.: If the file is owned by `X` and I run the python script as `root`, the ownership changes to `root`. – Pedro Lobito Aug 22 '20 at 03:43
  • @SenhorLucS `file` is not builtin in Python 3 -- look at the shebang (and even on EOLed Python 2 `file` would be more readable than `f`) – jfs Oct 31 '20 at 16:43
62

As Jack Aidley had posted and jfs pointed out, this code will not work:

# Read in the file
filedata = None
with file = open('file.txt', 'r') :
  filedata = file.read()

# Replace the target string
filedata.replace('ram', 'abcd')

# Write the file out again
with file = open('file.txt', 'w') :
  file.write(filedata)`

But this code will work (I've tested it):

f = open(filein,'r')
filedata = f.read()
f.close()

newdata = filedata.replace("old data","new data")

f = open(fileout,'w')
f.write(newdata)
f.close()

Using this method, filein and fileout can be the same file, because Python 3.3 will overwrite the file upon opening for write.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Neamerjell
  • 825
  • 6
  • 7
  • 12
    I believe the difference is here: filedata.replace('ram', 'abcd') Compared to: newdata = filedata.replace("old data","new data") Nothing to do with the "with" statement – Diegomanas Oct 16 '14 at 13:17
  • 8
    1. why would you remove `with`-statement? 2. As stated in my answer, `fileinput` can work inplace -- it can replace data in same file (it uses a temporary file internally). The difference is that `fileinput` does not require to load the whole file into memory. – jfs Jan 30 '15 at 20:05
  • 15
    Just to save others revisiting Jack Aidley's answer, it has been corrected since this answer, so this one is now redundant (and inferior due to losing the neater `with` blocks). – Chris Apr 26 '17 at 20:08
  • 1
    Not very pythonic. I'd either use a `try`/`finally` to make sure that the file is always closed, or the usual `with` statement, or the `fileinput` option. – SenhorLucas Jan 22 '21 at 14:47
57

You can do the replacement like this

f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'w')
for line in f1:
    f2.write(line.replace('old_text', 'new_text'))
f1.close()
f2.close()
Jayram
  • 18,820
  • 6
  • 51
  • 68
27

You can also use pathlib.

from pathlib2 import Path
path = Path(file_to_search)
text = path.read_text()
text = text.replace(text_to_search, replacement_text)
path.write_text(text)
Georgy
  • 12,464
  • 7
  • 65
  • 73
Yuya Takashina
  • 592
  • 6
  • 13
  • Thanks Yuya. The above solution worked well. Note: You need to take backup of your original file first, since it replaces your original file itself. If you want to repeatedly replace text then you can keep adding last 2 lines as below. text = text.replace(text_to_search, replacement_text) path.write_text(text) – Nages Mar 01 '20 at 01:22
13

(pip install python-util)

from pyutil import filereplace

filereplace("somefile.txt","abcd","ram")

Will replace all occurences of "abcd" with "ram".
The function also supports regex by specifying regex=True

from pyutil import filereplace

filereplace("somefile.txt","\\w+","ram",regex=True)

Disclaimer: I'm the author (https://github.com/MisterL2/python-util)

MisterL2
  • 161
  • 2
  • 7
  • 3
    I had some bad experience with this (it added some characters to the end of the file), so I cannot recommend it, even though a one-liner would be nice. – Azrael3000 Jun 16 '20 at 08:31
  • 2
    @Azrael3000 It added characters? I have not seen that happen to me. I would highly appreciate if you opened an issue ony Github so I can fix it https://github.com/MisterL2/python-util – MisterL2 Jun 19 '20 at 13:23
  • 3
    Thanks for the github issue! Problem has been resolved and is fully working now. – MisterL2 Dec 01 '20 at 12:57
9

Open the file in read mode. Read the file in string format. Replace the text as intended. Close the file. Again open the file in write mode. Finally, write the replaced text to the same file.

try:
    with open("file_name", "r+") as text_file:
        texts = text_file.read()
        texts = texts.replace("to_replace", "replace_string")
    with open(file_name, "w") as text_file:
        text_file.write(texts)
except FileNotFoundError as f:
    print("Could not find the file you are trying to read.")
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
Sanzv
  • 91
  • 1
  • 2
7

Late answer, but this is what I use to find and replace inside a text file:

with open("test.txt") as r:
  text = r.read().replace("THIS", "THAT")
with open("test.txt", "w") as w:
  w.write(text)

DEMO

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
4

With a single with block, you can search and replace your text:

with open('file.txt','r+') as f:
    filedata = f.read()
    filedata = filedata.replace('abc','xyz')
    f.truncate(0)
    f.write(filedata)
iknowitwasyoufredo
  • 605
  • 1
  • 10
  • 19
  • 4
    You forgot to `seek` to the beginning of the file before writing it. `truncate` doesn't do that and so you will have garbage in the file. – ur. Jul 25 '19 at 08:55
3

Your problem stems from reading from and writing to the same file. Rather than opening fileToSearch for writing, open an actual temporary file and then after you're done and have closed tempFile, use os.rename to move the new file over fileToSearch.

icktoofay
  • 126,289
  • 21
  • 250
  • 231
  • 2
    Friendly FYI (feel free to edit into the answer): The root cause is not being able to shorten the middle of a file in place. That is, if you search for 5 characters and replace with 3, the first 3 chars of the 5 searched for will be replaced; but the other 2 can't be removed, they'll just stay there. The temporary file solution removes these "leftover" characters by dropping them instead of writing them out to the temporary file. – michaelb958--GoFundMonica Jun 17 '13 at 05:53
3

My variant, one word at a time on the entire file.

I read it into memory.

def replace_word(infile,old_word,new_word):
    if not os.path.isfile(infile):
        print ("Error on replace_word, not a regular file: "+infile)
        sys.exit(1)

    f1=open(infile,'r').read()
    f2=open(infile,'w')
    m=f1.replace(old_word,new_word)
    f2.write(m)
LiPi
  • 308
  • 2
  • 6
3

Using re.subn it is possible to have more control on the substitution process, such as word splitted over two lines, case-(in)sensitive match. Further, it returns the amount of matches which can be used to avoid waste of resources if the string is not found.

import re

file = # path to file

# they can be also raw string and regex
textToSearch = r'Ha.*O' # here an example with a regex
textToReplace = 'hallo'

# read and replace
with open(file, 'r') as fd:
    # sample case-insensitive find-and-replace
    text, counter = re.subn(textToSearch, textToReplace, fd.read(), re.I)

# check if there is at least a  match
if counter > 0:
    # edit the file
    with open(file, 'w') as fd:
        fd.write(text)

# summary result
print(f'{counter} occurence of "{textToSearch}" were replaced with "{textToReplace}".')

Some regex:

  • add the re.I flag, short form of re.IGNORECASE, for a case-insensitive match
  • for multi-line replacement re.subn(r'\n*'.join(textToSearch), textToReplace, fd.read()), depending on the data also '\n{,1}'. Notice that for this case textToSearch must be a pure string, not a regex!
cards
  • 3,936
  • 1
  • 7
  • 25
2

Besides the answers already mentioned, here is an explanation of why you have some random characters at the end:
You are opening the file in r+ mode, not w mode. The key difference is that w mode clears the contents of the file as soon as you open it, whereas r+ doesn't.
This means that if your file content is "123456789" and you write "www" to it, you get "www456789". It overwrites the characters with the new input, but leaves any remaining input untouched.
You can clear a section of the file contents by using truncate(<startPosition>), but you are probably best off saving the updated file content to a string first, then doing truncate(0) and writing it all at once.
Or you can use my library :D

MisterL2
  • 161
  • 2
  • 7
2

I got the same issue. The problem is that when you load a .txt in a variable you use it like an array of string while it's an array of character.

swapString = []
with open(filepath) as f: 
    s = f.read()
for each in s:
    swapString.append(str(each).replace('this','that'))
s = swapString
print(s)

2

I tried this and used readlines instead of read

with open('dummy.txt','r') as file:
    list = file.readlines()
print(f'before removal {list}')
for i in list[:]:
        list.remove(i)

print(f'After removal {list}')
with open('dummy.txt','w+') as f:
    for i in list:
        f.write(i)
2

You can use sed or AWK or grep in Python (with some restrictions). Here is a very simple example. It changes banana to bananatoothpaste in the file. You can edit and use it. (I tested it and it worked... Note: if you are testing under Windows, you should install the "sed" command and set the path first)

import os

file = "a.txt"
oldtext = "Banana"
newtext = " BananaToothpaste"
os.system('sed -i "s/{}/{}/g" {}'.format(oldtext, newtext, file))
#print(f'sed -i "s/{oldtext}/{newtext}/g" {file}')
print('This command was applied:  sed -i "s/{}/{}/g" {}'.format(oldtext, newtext, file))

If you want to see results on the file directly apply: "type" for Windows and "cat" for Linux:

#### For Windows:
os.popen("type " + file).read()

#### For Linux:
os.popen("cat " + file).read()
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
BARIS KURT
  • 477
  • 4
  • 15
1

I have done this:

#!/usr/bin/env python3

import fileinput
import os

Dir = input ("Source directory: ")
os.chdir(Dir)

Filelist = os.listdir()
print('File list: ',Filelist)

NomeFile = input ("Insert file name: ")

CarOr = input ("Text to search: ")

CarNew = input ("New text: ")

with fileinput.FileInput(NomeFile, inplace=True, backup='.bak') as file:
    for line in file:
        print(line.replace(CarOr, CarNew), end='')

file.close ()
Zelmik
  • 19
  • 2
1

I modified Jayram's post slightly in order to replace every instance of a '!' character to a number which I wanted to increment with each instance. I thought it might be helpful to someone who wanted to modify a character that occurred more than once per line and wanted to iterate. This worked for me.

f1 = open('file1.txt', 'r')
f2 = open('file2.txt', 'w')
n = 1

# if word=='!'replace w/ [n] & increment n; else append same word to
# file2

for line in f1:
    for word in line:
        if word == '!':
            f2.write(word.replace('!', f'[{n}]'))
            n += 1
        else:
            f2.write(word)
f1.close()
f2.close()
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Doc5506
  • 11
  • 4
1

Use:

def word_replace(filename, old, new):
    c = 0
    with open(filename, 'r+', encoding ='utf-8') as f:
        a = f.read()
        b = a.split()
        for i in range(0, len(b)):
            if b[i] == old:
                c = c + 1
        old = old.center(len(old) + 2)
        new = new.center(len(new) + 2)
        d = a.replace(old, new, c)
        f.truncate(0)
        f.seek(0)
        f.write(d)

    print('All words have been replaced!!!')
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vinit Pillai
  • 518
  • 6
  • 17
  • 1
    This code will replace the word you intend. the only problem is it rewrites the whole file. might get stuck if the file is too long for the processor to handle. – Vinit Pillai Jan 23 '18 at 18:47
0

I have worked this out as an exercise of a course: open file, find and replace string and write to a new file.

class Letter:

    def __init__(self):

        with open("./Input/Names/invited_names.txt", "r") as file:
            # read the list of names
            list_names = [line.rstrip() for line in file]
            with open("./Input/Letters/starting_letter.docx", "r") as f:
                # read letter
                file_source = f.read()
            for name in list_names:
                with open(f"./Output/ReadyToSend/LetterTo{name}.docx", "w") as f:
                    # replace [name] with name of the list in the file
                    replace_string = file_source.replace('[name]', name)
                    # write to a new file
                    f.write(replace_string)


brief = Letter()
Ruli
  • 2,592
  • 12
  • 30
  • 40
xoradin
  • 19
  • 1
-1

Like so:

def find_and_replace(file, word, replacement):
  with open(file, 'r+') as f:
    text = f.read()
    f.write(text.replace(word, replacement))
  • 1
    Please ensure that your answer improves upon other answers already present in this question. – hongsy Jan 17 '20 at 07:27
  • This will append the text with replacement to the end of the file, in my opinion @Jack Aidley aswer is just what OP meant https://stackoverflow.com/a/17141572/6875391 – klapshin Mar 12 '20 at 14:57
-2
def findReplace(find, replace):

    import os 

    src = os.path.join(os.getcwd(), os.pardir) 

    for path, dirs, files in os.walk(os.path.abspath(src)):

        for name in files: 

            if name.endswith('.py'): 

                filepath = os.path.join(path, name)

                with open(filepath) as f: 

                    s = f.read()

                s = s.replace(find, replace) 

                with open(filepath, "w") as f:

                    f.write(s) 
Deepak G
  • 677
  • 9
  • 10