How to do sed like text replace with python?

Question

I would like to enable all apt repositories in this file

cat /etc/apt/sources.list
## Note, this file is written by cloud-init on first boot of an instance                                                                                                            
## modifications made here will not survive a re-bundle.                                                                                                                            
## if you wish to make changes you can:                                                                                                                                             
## a.) add 'apt_preserve_sources_list: true' to /etc/cloud/cloud.cfg                                                                                                                
##     or do the same in user-data
## b.) add sources in /etc/apt/sources.list.d                                                                                                                                       
#                                                                                                                                                                                   

# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to                                                                                                           
# newer versions of the distribution.                                                                                                                                               
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                                   
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick main                                                                                                               

## Major bug fix updates produced after the final release of the                                                                                                                    
## distribution.                                                                                                                                                                    
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                           
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates main                                                                                                       

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu                                                                                                         
## team. Also, please note that software in universe WILL NOT receive any                                                                                                           
## review or updates from the Ubuntu security team.                                                                                                                                 
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                               
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick universe                                                                                                           
deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe
deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates universe

## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu 
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in 
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick multiverse
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-updates multiverse

## Uncomment the following two lines to add software from the 'backports'
## repository.
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
# deb http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse
# deb-src http://us-east-1.ec2.archive.ubuntu.com/ubuntu/ maverick-backports main restricted universe multiverse

## Uncomment the following two lines to add software from Canonical's
## 'partner' repository.
## This software is not part of Ubuntu, but is offered by Canonical and the
## respective vendors as a service to Ubuntu users.
# deb http://archive.canonical.com/ubuntu maverick partner
# deb-src http://archive.canonical.com/ubuntu maverick partner

deb http://security.ubuntu.com/ubuntu maverick-security main
deb-src http://security.ubuntu.com/ubuntu maverick-security main
deb http://security.ubuntu.com/ubuntu maverick-security universe
deb-src http://security.ubuntu.com/ubuntu maverick-security universe
# deb http://security.ubuntu.com/ubuntu maverick-security multiverse
# deb-src http://security.ubuntu.com/ubuntu maverick-security multiverse

With sed this is a simple sed -i 's/^# deb/deb/' /etc/apt/sources.list what's the most elegant ("pythonic") way to do this?

pythonpy (https://github.com/russell91/pythonpy) gives you a nice way to interact with the command line: `cat /etc/apt/sources.list | py -x 're.sub(r"^# deb", "deb", x)'` — RussellStewart, Sep 27 '14 at 05:09

score 71 · Answer 1 · edited Jul 23 '13 at 15:59

71

You can do that like this:

with open("/etc/apt/sources.list", "r") as sources:
    lines = sources.readlines()
with open("/etc/apt/sources.list", "w") as sources:
    for line in lines:
        sources.write(re.sub(r'^# deb', 'deb', line))

The with statement ensures that the file is closed correctly, and re-opening the file in "w" mode empties the file before you write to it. re.sub(pattern, replace, string) is the equivalent of s/pattern/replace/ in sed/perl.

Edit: fixed syntax in example

edited Jul 23 '13 at 15:59

dpb

1,205
2
9
20

answered Dec 13 '10 at 10:22

David Miller

2,189
2
15
17

1

Good idea using `with`, but with this you will just append the new sources.list to the old one. – plundra Dec 13 '10 at 10:33
This looks great (in syntax) but it duplicated the file. Do I need to do a truncate? Also, does this load the whole file into memory or is it "streaming" approach of line by line operation? – Maxim Veksler Dec 13 '10 at 10:40
3

As plundra notes, your solution writes non-atomically and hence invites race conditions (e.g., with other processes and/or threads attempting to concurrently read such file while it's being rewritten). That's a problem. But it's still elegant and rad. – Cecil Curry Jul 20 '15 at 04:13
1

It might be slightly safer to copy/move the original file and then nest `with open(copied_or_moved_original, "r") as source: with(original_name, "w") as destination` inside a `try...except`. Then you can easily restore the original file in case of errors, and this works for files too large to be completely stored in memory... (In contrast to writing to a temporary file and then replacing the original, overwriting the original from its copy has the advantage of probably working better with filesystem versioning such as Shadow Copy and NTFS Streams) – Tobias Kienzler Jan 12 '16 at 07:22
...[`fileinput.input(..., inplace=True)`](https://docs.python.org/2/library/fileinput.html) seems to basically do this for you – Tobias Kienzler Jan 12 '16 at 07:22
This fails in the following case `s/^Q(.*)/"&"/`: you'll replace the match with the literal `"&"` instead of the intended task of surrounding the match with quotes – Bob Oct 13 '17 at 14:56
Very nice! I used this approach in a script ("hn.py", available on GitHub) where I use a dictionary to invoke multiple changes to lines in a file, and save those results (in situ). https://github.com/victoriastuart/hacker_news_scraper – Victoria Stuart May 05 '20 at 17:51
import? readers should not need to google twice – IceFire Jan 28 '21 at 07:10

Cecil Curry · Answer 2 · 2015-07-20T03:56:42.967

Authoring a homegrown sed replacement in pure Python with no external commands or additional dependencies is a noble task laden with noble landmines. Who would have thought?

Nonetheless, it is feasible. It's also desirable. We've all been there, people: "I need to munge some plaintext files, but I only have Python, two plastic shoelaces, and a moldy can of bunker-grade Maraschino cherries. Help."

In this answer, we offer a best-of-breed solution cobbling together the awesomeness of prior answers without all of that unpleasant not-awesomeness. As plundra notes, David Miller's otherwise top-notch answer writes the desired file non-atomically and hence invites race conditions (e.g., from other threads and/or processes attempting to concurrently read that file). That's bad. Plundra's otherwise excellent answer solves that issue while introducing yet more – including numerous fatal encoding errors, a critical security vulnerability (failing to preserve the permissions and other metadata of the original file), and premature optimization replacing regular expressions with low-level character indexing. That's also bad.

Awesomeness, unite!

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            for line in src_file:
                tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

# Do it for Johnny.
sed_inplace('/etc/apt/sources.list', r'^\# deb', 'deb')

This fails in the following case `s/^Q(.*)/"&"/`: you'll replace the match with the literal `"&"` instead of the intended task of surrounding the match with quotes — Bob, Oct 13 '17 at 14:52
@Adrian this is not a failure perse - as it fails only because Python has a different regular expression dialect from what Sed uses, that does not interpret the & as contents of pattern - in this case you should use \0. Simply telling users that they should use Python's re dialect should be fine. One probably shouldn't have 'sed' in the name in this case though. — Keithel, Aug 02 '19 at 19:45

elmotec · Accepted Answer · 2012-07-07T02:59:16.867

27

massedit.py (http://github.com/elmotec/massedit) does the scaffolding for you leaving just the regex to write. It's still in beta but we are looking for feedback.

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" /etc/apt/sources.list

will show the differences (before/after) in diff format.

Add the -w option to write the changes to the original file:

python -m massedit -e "re.sub(r'^# deb', 'deb', line)" -w /etc/apt/sources.list

Alternatively, you can now use the api:

>>> import massedit
>>> filenames = ['/etc/apt/sources.list']
>>> massedit.edit_files(filenames, ["re.sub(r'^# deb', 'deb', line)"], dry_run=True)

edited Jul 07 '12 at 02:59

answered Jul 04 '12 at 15:50

elmotec

1,398
1
13
12

@MaximVeksler This fails in the following case `s/^Q(.*)/"&"/`: you'll replace the match with the literal `"&"` instead of the intended task of surrounding the match with quotes – Bob Oct 13 '17 at 14:57
Is it possible to replace multiple regular expressions at once? – aaragon Apr 20 '18 at 14:32
Yes, check out the -g (generate) and -f (function or file) options which respectively allow you to create a template Python file to be modified and use it on every input file that will be process by your source-to-source tool. If I am not clear, just generate the file with -g and check it out. It should make more sense. – elmotec Apr 22 '18 at 12:43

score 12 · Answer 4 · answered Dec 13 '10 at 11:31

12

This is such a different approach, I don't want to edit my other answer. Nested with since I don't use 3.1 (Where with A() as a, B() as b: works).

Might be a bit overkill to change sources.list, but I want to put it out there for future searches.

#!/usr/bin/env python
from shutil   import move
from tempfile import NamedTemporaryFile

with NamedTemporaryFile(delete=False) as tmp_sources:
    with open("sources.list") as sources_file:
        for line in sources_file:
            if line.startswith("# deb"):
                tmp_sources.write(line[2:])
            else:
                tmp_sources.write(line)

move(tmp_sources.name, sources_file.name)

This should ensure no race conditions of other people reading the file. Oh, and I prefer str.startswith(...) when you can do without a regexp.

answered Dec 13 '10 at 11:31

plundra

18,542
3
33
27

I totally get the desire to not involve regex wherever possible:) Meanwhile: with, str.startswith() and NamedTemporaryFile show the kind of batteries-included approach of python that make it so useful lots of the time for simple tasks like this. – David Miller Dec 13 '10 at 16:12
Out of interest, why did you use `shutil.move` rather than `os.rename`? – Mark Longair Feb 08 '11 at 15:31
2

@Mark Longair: `os.rename` doesn't work between filesystems. If `/tmp` where on `tmpfs` for example, it would fail. – plundra Feb 09 '11 at 10:54
As of 2015, this is probably the best answer. In fact, it's a great answer. **Unfortunately, it's also painfully wrong.** Since `NamedTemporaryFile()` defaults to `mode='w+b'`, an encoding _must_ be explicitly specified when writing text strings. Likewise, all metadata (e.g., permissions) of the original file _must_ be preserved across the move. – Cecil Curry Jul 20 '15 at 04:09

score 6 · Answer 5 · answered Apr 29 '14 at 11:35

If you are using Python3 the following module will help you: https://github.com/mahmoudadel2/pysed

wget https://raw.githubusercontent.com/mahmoudadel2/pysed/master/pysed.py

Place the module file into your Python3 modules path, then:

import pysed
pysed.replace(<Old string>, <Replacement String>, <Text File>)
pysed.rmlinematch(<Unwanted string>, <Text File>)
pysed.rmlinenumber(<Unwanted Line Number>, <Text File>)

MatrixManAtYrService · Answer 6 · 2020-01-30T21:12:43.050

6

If I want something like sed, then I usually just call sed itself using the sh library.

from sh import sed

sed(['-i', 's/^# deb/deb/', '/etc/apt/sources.list'])

Sure, there are downsides. Like maybe the locally installed version of sed isn't the same as the one you tested with. In my cases, this kind of thing can be easily handled at another layer (like by examining the target environment beforehand, or deploying in a docker image with a known version of sed).

edited Jan 30 '20 at 21:12

answered Jan 30 '20 at 21:02

MatrixManAtYrService

8,023
1
50
61

I think this is the most Pythonic way as "sh is a full-fledged subprocess replacement for Python 2, Python 3, PyPy and PyPy3 that allows you to call any program as if it were a function:" per https://pypi.org/project/sh/ – 300 Jan 03 '23 at 22:13

score 4 · Answer 7 · edited Sep 24 '20 at 05:58

4

Try pysed:

pysed -r '# deb' 'deb' /etc/apt/sources.list

edited Sep 24 '20 at 05:58

hoijui

3,615
2
33
41

answered Jun 23 '14 at 07:35

dslackw

83
1
7

score 3 · Answer 8 · edited Jun 23 '14 at 23:05

3

If you really want to use a sed command without installing a new Python module, you could simply do the following:

import subprocess
subprocess.call("sed command")

edited Jun 23 '14 at 23:05

brasofilo

25,496
15
91
179

answered Jun 23 '14 at 20:29

Brad Jasperson

75
1

score 3 · Answer 9 · answered Dec 13 '10 at 10:51

You could do something like:

p = re.compile("^\# *deb", re.MULTILINE)
text = open("sources.list", "r").read()
f = open("sources.list", "w")
f.write(p.sub("deb", text))
f.close()

Alternatively (imho, this is better from organizational standpoint) you could split your sources.list into pieces (one entry/one repository) and place them under /etc/apt/sources.list.d/

plundra · Answer 10 · 2010-12-13T10:43:35.997

2

Not sure about elegant, but this ought to be pretty readable at least. For a sources.list it's fine to read all the lines before hand, for something larger you might want to change "in place" while looping through it.

#!/usr/bin/env python
# Open file for reading and writing
with open("sources.list", "r+") as sources_file:
    # Read all the lines
    lines = sources_file.readlines()

    # Rewind and truncate
    sources_file.seek(0)
    sources_file.truncate()

    # Loop through the lines, adding them back to the file.
    for line in lines:
        if line.startswith("# deb"):
            sources_file.write(line[2:])
        else:
            sources_file.write(line)

EDIT: Use with-statement for better file-handling. Also forgot to rewind before truncate before.

edited Dec 13 '10 at 10:43

answered Dec 13 '10 at 10:11

plundra

18,542
3
33
27

I'd just read the file, close it, reopen it in write mode, and write the modified version. That saves worrying about seek and truncate. – Thomas K Dec 13 '10 at 10:57
@Thomas, yeah. Doesn't feel that pythonic :-P Thought of doing it with a tempfile and then move it in place too, to be atomic(-ish). – plundra Dec 13 '10 at 11:01
I don't know that there is a Pythonic way to modify a file in place. The tempfile idea has some merit, though. – Thomas K Dec 13 '10 at 11:05

score 2 · Answer 11 · answered Aug 02 '19 at 20:12

Cecil Curry has a great answer, however his answer only works for multiline regular expressions. Multiline regular expressions are more rarely used, but they are handy sometimes.

Here is an improvement upon his sed_inplace function that allows it to function with multiline regular expressions if asked to do so.

WARNING: In multiline mode, it will read the entire file in, and then perform the regular expression substitution, so you'll only want to use this mode on small-ish files - don't try to run this on gigabyte-sized files when running in multiline mode.

import re, shutil, tempfile

def sed_inplace(filename, pattern, repl, multiline = False):
    '''
    Perform the pure-Python equivalent of in-place `sed` substitution: e.g.,
    `sed -i -e 's/'${pattern}'/'${repl}' "${filename}"`.
    '''
    re_flags = 0
    if multiline:
        re_flags = re.M

    # For efficiency, precompile the passed regular expression.
    pattern_compiled = re.compile(pattern, re_flags)

    # For portability, NamedTemporaryFile() defaults to mode "w+b" (i.e., binary
    # writing with updating). This is usually a good thing. In this case,
    # however, binary writing imposes non-trivial encoding constraints trivially
    # resolved by switching to text writing. Let's do that.
    with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
        with open(filename) as src_file:
            if multiline:
                content = src_file.read()
                tmp_file.write(pattern_compiled.sub(repl, content))
            else:
                for line in src_file:
                    tmp_file.write(pattern_compiled.sub(repl, line))

    # Overwrite the original file with the munged temporary file in a
    # manner preserving file attributes (e.g., permissions).
    shutil.copystat(filename, tmp_file.name)
    shutil.move(tmp_file.name, filename)

from os.path import expanduser
sed_inplace('%s/.gitconfig' % expanduser("~"), r'^(\[user\]$\n[ \t]*name = ).*$(\n[ \t]*email = ).*', r'\1John Doe\2jdoe@example.com', multiline=True)

score 1 · Answer 12 · answered Jun 19 '13 at 17:31

Here's a one-module Python replacement for perl -p:

# Provide compatibility with `perl -p`

# Usage:
#
#     python -mloop_over_stdin_lines '<program>'

# In, `<program>`, use the variable `line` to read and change the current line.

# Example:
#
#         python -mloop_over_stdin_lines 'line = re.sub("pattern", "replacement", line)'

# From the perlrun documentation:
#
#        -p   causes Perl to assume the following loop around your
#             program, which makes it iterate over filename arguments
#             somewhat like sed:
# 
#               LINE:
#                 while (<>) {
#                     ...             # your program goes here
#                 } continue {
#                     print or die "-p destination: $!\n";
#                 }
# 
#             If a file named by an argument cannot be opened for some
#             reason, Perl warns you about it, and moves on to the next
#             file. Note that the lines are printed automatically. An
#             error occurring during printing is treated as fatal. To
#             suppress printing use the -n switch. A -p overrides a -n
#             switch.
# 
#             "BEGIN" and "END" blocks may be used to capture control
#             before or after the implicit loop, just as in awk.
# 

import re
import sys

for line in sys.stdin:
    exec(sys.argv[1], globals(), locals())
    try:
        print line,
    except:
        sys.exit('-p destination: $!\n')

score 1 · Answer 13 · answered Feb 02 '16 at 15:43

I wanted to be able to find and replace text but also include matched groups in the content I insert. I wrote this short script to do that:

https://gist.github.com/turtlemonvh/0743a1c63d1d27df3f17

The key component of that is something that looks like like this:

print(re.sub(pattern, template, text).rstrip("\n"))

Here's an example of how that works:

# Find everything that looks like 'dog' or 'cat' followed by a space and a number
pattern = "((cat|dog) (\d+))"

# Replace with 'turtle' and the number. '3' because the number is the 3rd matched group.
# The double '\' is needed because you need to escape '\' when running this in a python shell
template = "turtle \\3"

# The text to operate on
text = "cat 976 is my favorite"

Calling the above function with this yields:

turtle 976 is my favorite

score 1 · Answer 14 · edited Jun 20 '20 at 09:12

[None of the answers works properly above !]

I have a case of multiple key-value replacement in one file around 1000 lines. And after replacement the file structure should keep the same. for example:

key1=value_tobe_replaced1
key2=value_tobe_replaced1
.     .
.     .
key1000=value_tobe_replaced1000

I've tried:

the voted answer from @elmotec for massedit.
answer from @Cecil Curry.
answer from @Keithel.

The three answers definitely helped me a lot but after test I found it costs nearly 40-50s for 1st and 2ed. 3rd is not suitable for multi-replacement so I fixed it.

Notice: refer to the answers before go on.

Here's my code:

Line replacement mode:

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        for line in kf:
            line_to_write = ''
            match_flag = False
            for (key, value) in tuple_list:
                # print '  %s = %r' % (key, value)
                if  not re.search(patten, line, flags=re.I):
                    continue
                line_to_write = re.sub(r'\$\({}\)'.format(key), value, line, flags=re.I)
                match_flag = True

            if not match_flag:
                line_to_write = line
            tmp_file.write(line_to_write)

shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs

time costs: 0:00:42.533879

file replacement mode:

start_time = datetime.datetime.now()
with tempfile.NamedTemporaryFile(mode='w', delete=False) as tmp_file:
    with open(abs_keypair_file) as kf:
        text = kf.read()
        for (key, value) in tuple_list:
            text = re.sub(patten, value, text, flags=re.M|re.I)
        tmp_file.write(text)
shutil.copystat(abs_keypair_file, tmp_file.name)
shutil.move(tmp_file.name, abs_keypair_file)

time_costs = datetime.datetime.now() - start_time
print 'time costs: %s' % time_costs

time costs: 0:00:00.348458

So I suggest if you match my case and your file size is not too large you may follow file replacement mode.

How to replace if file size is huge? I have no idea.

Hope this helps.

How to do sed like text replace with python?

14 Answers14

[None of the answers works properly above !]

Line replacement mode:

file replacement mode:

Linked