replacing certain expressions in file but only one time

Question

I extracted some expressions from a file and I want to insert these expressions in the same file but under different format, like between brackets. My problem is that I want for every expression only one replacing. the file looks like this

file = """he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautiful"""

and the expressions are like this

ex = """ good, clever, beautiful, bad,"""

the code used is

adj =  ex.split(",") 
for a in adj:
  if a in file:
     file = file.replace(a, ' ' +'[[' + a + ']]')
print file

this gives the following output:

he is a  [[good]] man [[
]]she is a [[ beautiful]] woman [[
]]this is a [[ clever]] student [[
]]he is a [[ bad]] neighbour [[
]]they are [[ bad]] men [[
]]She is very [[ beautiful]] [[
]] [[
]]

while the expected output is

he is a  [[good]] man 
she is a [[ beautiful]] woman 
this is a [[ clever]] student 
he is a [[ bad]] neighbour 
they are bad men # so here "bad" will not be replaced because there is another 'bad' replaced 
She is very beautiful # and here 'beautiful' will not be replaced like 'bad'

Simply `strip()` the line before you `split()` it, and you're free to go. Also check for empty `a`'s in `adj` while iterating, so you may skip them if they show up. — Rubens, Dec 15 '14 at 12:22

biobirdman · Accepted Answer · 2014-12-15T12:45:56.207

If file content is stored as string

the replace method of a string also takes in a third optional argument called max.

http://www.tutorialspoint.com/python/string_replace.htm

This will allow you to choose the occurrence of a word that you want to replace.

for instance,

>>> "he is a good man, and a good husband".replace('good', '[[ good ]]', 1)
'he is a [[ good ]] man, and a good husband'
>>>

Hang on, im working on your example now.

Example 2 : Read from a file, one line at a time.

In the above method, I will assume that you have read the file and store its content as a single string . In the 2nd answer below, I will show you how you may implement your code to solve your problem

Assuming you have a file `testfile.txt` with the following content :

he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautifu

Here is your python code

#!/usr/bin/env python

# your expression 
ex = """ good, clever, beautiful, bad,"""

# list comprehension to clean up your expression, 
# first by spliting it by comma and then remove anything that is just a empty
wanted_terms = [x.strip() for x in ex.split(',') if x.strip() != '']

## read file using with statement
with open('testfile.txt') as f:
    for line in f:
        line = line.strip()
        ## for each wanted terms check if they exist in the line 
        for x in wanted_terms:
            if x in line:
                ## I prefer to use string format here.
                #replacement = "[[ %s ]]" % x 
                #line = line.replace(x, replacement, 1)

                ## if term exist, do replacement. Use max =1 to ensure it replace only the first instance.
                line = line.replace(x, '[[' + x +']]', 1 )
                ## remove it from term list so that in future, it will replace any new occurence
                wanted_terms.remove(x)

Let me know you find this useful or if there are any other comments,

Cheers, Biobirdman

score 0 · Answer 2 · answered Dec 15 '14 at 12:36

biobirdman seems to have a good solution, so use that for the correct thing. My post here is just to explain what went wrong. When you did:

ex = """ good, clever, beautiful, bad,"""
adj =  ex.split(",")

You got something other than what you thought

print adj
[' good', ' clever', ' beautiful', ' bad', '']

I don't know if you mean to have a space before each one string, but you almost certainly don't mean to have a '' at the end. In fact, I think you didn't have this for your example, otherwise you'd get a different bad behavior. What I think you had was a new line character at the end of ex. So that '' that's showing up was actually a newline in your attempt.

So it matched all the ones you expected, plus all the newlines for you. For anyone using the code you posted, they'll get a match between every pair of characters.

[[]]h [[]]e [[]]  [[]]i [[]]s [[]]  [[]]a  ........

TO fix: get rid of the newline. Eliminate the extra spaces. How? Take a look at strip.

score 0 · Answer 3 · answered Dec 15 '14 at 12:59

Two changes to your code. Avoiding a empty string in adj and removing leading whitespaces when you replace word with [[word]]. word has values like " beautiful", " clever" in your code.

file = """he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautiful"""

ex = """ good, clever, beautiful, bad,"""

adj = filter(None, ex.split(","))    # removing empty strings from list
# SO ref: http://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings

for a in adj:
    if a in file:
        file = file.replace(a, ' ' +'[[' + a.strip() + ']]')    # strip() removes leading or trailing whitespaces

print file

replacing certain expressions in file but only one time

3 Answers3

If file content is stored as string

Example 2 : Read from a file, one line at a time.

Assuming you have a file testfile.txt with the following content :

Here is your python code

Assuming you have a file `testfile.txt` with the following content :