0

I extracted some expressions from a file and I want to insert these expressions in the same file but under different format, like between brackets. My problem is that I want for every expression only one replacing. the file looks like this

file = """he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautiful"""

and the expressions are like this

ex = """ good, clever, beautiful, bad,"""

the code used is

adj =  ex.split(",") 
for a in adj:
  if a in file:
     file = file.replace(a, ' ' +'[[' + a + ']]')
print file

this gives the following output:

he is a  [[good]] man [[
]]she is a [[ beautiful]] woman [[
]]this is a [[ clever]] student [[
]]he is a [[ bad]] neighbour [[
]]they are [[ bad]] men [[
]]She is very [[ beautiful]] [[
]] [[
]]

while the expected output is

he is a  [[good]] man 
she is a [[ beautiful]] woman 
this is a [[ clever]] student 
he is a [[ bad]] neighbour 
they are bad men # so here "bad" will not be replaced because there is another 'bad' replaced 
She is very beautiful # and here 'beautiful' will not be replaced like 'bad'
Marc
  • 3,683
  • 8
  • 34
  • 48
  • Simply `strip()` the line before you `split()` it, and you're free to go. Also check for empty `a`'s in `adj` while iterating, so you may skip them if they show up. – Rubens Dec 15 '14 at 12:22

3 Answers3

1

If file content is stored as string

the replace method of a string also takes in a third optional argument called max.

http://www.tutorialspoint.com/python/string_replace.htm

This will allow you to choose the occurrence of a word that you want to replace.

for instance,

>>> "he is a good man, and a good husband".replace('good', '[[ good ]]', 1)
'he is a [[ good ]] man, and a good husband'
>>>

Hang on, im working on your example now.

Example 2 : Read from a file, one line at a time.

In the above method, I will assume that you have read the file and store its content as a single string . In the 2nd answer below, I will show you how you may implement your code to solve your problem

Assuming you have a file testfile.txt with the following content :

he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautifu

Here is your python code

#!/usr/bin/env python

# your expression 
ex = """ good, clever, beautiful, bad,"""

# list comprehension to clean up your expression, 
# first by spliting it by comma and then remove anything that is just a empty
wanted_terms = [x.strip() for x in ex.split(',') if x.strip() != '']

## read file using with statement
with open('testfile.txt') as f:
    for line in f:
        line = line.strip()
        ## for each wanted terms check if they exist in the line 
        for x in wanted_terms:
            if x in line:
                ## I prefer to use string format here.
                #replacement = "[[ %s ]]" % x 
                #line = line.replace(x, replacement, 1)

                ## if term exist, do replacement. Use max =1 to ensure it replace only the first instance.
                line = line.replace(x, '[[' + x +']]', 1 )
                ## remove it from term list so that in future, it will replace any new occurence
                wanted_terms.remove(x)

Let me know you find this useful or if there are any other comments,

Cheers, Biobirdman

biobirdman
  • 4,060
  • 1
  • 17
  • 15
0

biobirdman seems to have a good solution, so use that for the correct thing. My post here is just to explain what went wrong. When you did:

ex = """ good, clever, beautiful, bad,"""
adj =  ex.split(",") 

You got something other than what you thought

print adj
[' good', ' clever', ' beautiful', ' bad', '']

I don't know if you mean to have a space before each one string, but you almost certainly don't mean to have a '' at the end. In fact, I think you didn't have this for your example, otherwise you'd get a different bad behavior. What I think you had was a new line character at the end of ex. So that '' that's showing up was actually a newline in your attempt.

So it matched all the ones you expected, plus all the newlines for you. For anyone using the code you posted, they'll get a match between every pair of characters.

[[]]h [[]]e [[]]  [[]]i [[]]s [[]]  [[]]a  ........

TO fix: get rid of the newline. Eliminate the extra spaces. How? Take a look at strip.

Joel
  • 22,598
  • 6
  • 69
  • 93
0

Two changes to your code. Avoiding a empty string in adj and removing leading whitespaces when you replace word with [[word]]. word has values like " beautiful", " clever" in your code.

file = """he is a good man
she is a beautiful woman
this is a clever student
he is a bad neighbour
they are bad men
She is very beautiful"""

ex = """ good, clever, beautiful, bad,"""

adj = filter(None, ex.split(","))    # removing empty strings from list
# SO ref: http://stackoverflow.com/questions/3845423/remove-empty-strings-from-a-list-of-strings

for a in adj:
    if a in file:
        file = file.replace(a, ' ' +'[[' + a.strip() + ']]')    # strip() removes leading or trailing whitespaces

print file
Prashanth
  • 1,252
  • 2
  • 13
  • 28