26

How can I replace strings in a file with using regular expressions in Python?

I want to open a file in which I should replace strings for other strings and we need to use regular expressions (search and replace). What would be some example of opening a file and using it with a search and replace method?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Michal Vasko
  • 321
  • 1
  • 3
  • 4

2 Answers2

44
# The following code will search 'MM/DD/YYYY' (e.g. 11/30/2016 or NOV/30/2016, etc ),
# and replace with 'MM-DD-YYYY' in multi-line mode.
import re
with open ('input.txt', 'r' ) as f:
    content = f.read()
    content_new = re.sub('(\d{2}|[a-yA-Y]{3})\/(\d{2})\/(\d{4})', r'\1-\2-\3', content, flags = re.M)
Quinn
  • 4,394
  • 2
  • 21
  • 19
  • 2
    By 'mm' you mean 11, not NOV (which has 3 characters), right? But then you're using \w to match it as a character word instead of a \d digit like the others, so 'mm' would need to be matched as \d as well. – R. Navega Mar 17 '18 at 05:57
  • You're right @R.Navega ; either `\d{2}` (for 11) or `\w{3}` (for nov) – gildux Feb 14 '21 at 20:22
  • Thanks @R.Navega and @gildux for your comments. I have updated the regex to include date format such as `11/30/2016`. – Quinn Feb 16 '21 at 04:50
  • why is your second string `r`aw but the first one is not? – lucidbrot Feb 21 '21 at 15:56
  • 1
    @lucidbrot: To the first one, with or without `r`, in this specific scenario, there is no difference. There is a thread, you can check it out: https://stackoverflow.com/questions/8157267/handling-backreferences-to-capturing-groups-in-re-sub-replacement-pattern – Quinn Feb 22 '21 at 18:36
  • Can you use `content=..` instead of `newvar=...`? – Timo Jun 29 '21 at 19:37
  • 1
    @Timo: Yes, of course. – Quinn Jul 01 '21 at 23:56
  • 1
    uh.. does this do inline replacement? don't you have to write it somewhere? – john k Dec 14 '21 at 22:35
  • 4
    @johnktejik: `re.sub` is not inline replacement, so additional steps are required if you'd like to save new content to a file. – Quinn Dec 19 '21 at 03:42
1

Here is a general format. You can either use re.sub or re.match, based on your requirement. Below is a general pattern for opening a file and doing it:

import re

input_file = open("input.h", "r")
output_file = open("output.h.h", "w")
br = 0
ot = 0

for line in input_file:
    match_br = re.match(r'\s*#define .*_BR (0x[a-zA-Z_0-9]{8})', line) # Should be your regular expression
    match_ot = re.match(r'\s*#define (.*)_OT (0x[a-zA-Z_0-9]+)', line) # Second regular expression

if match_br:
    br = match_br.group(1)
    # Do something

elif match_ot:
    ot = match_ot.group(2)
    # Do your replacement

else:
    output_file.write(line)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user2532296
  • 828
  • 1
  • 10
  • 27
  • 2
    Will not work with multiple line regular expressions. – Alyssa Haroldsen Feb 28 '16 at 21:04
  • 1
    thank you , I am just begginer in python and our project is to create script in Python which in line containing string xkcd (http://norvig.com/ipython/xkcd1313.ipynb) will replace bu.*ls with string [gikuj]..n|a.[alt]|[pivo].l|i..o|[jocy]e|sh|di|oo and also in line where are characters = a [ it will replace it with our name...so this is why i was asking this question because I am completely lost.. – Michal Vasko Feb 28 '16 at 21:11
  • 1
    I think you should formulate your requirements clearly and start trying some example here to start with https://pymotw.com/2/re/ – user2532296 Feb 28 '16 at 21:16