2

I am writing a script that introduces misspellings into sentence. I am using python re module to replace the original word with the misspelling. The script looks like this:

# replacing original word by error
pattern = re.compile(r'%s' % original_word)    
replace_by = r'\1' + err
modified_sentence = re.sub(pattern, replace_by, sentence, count=1)

But the problem is this will replace even if original_word was part of another word for example:

If i had

original_word = 'in'
err = 'il'
sentence = 'eating food in'

it would replace the occurrence of 'in' in eating like:

> 'eatilg food in'

I was checking in the re documentation but it doesn't give any example on how to include regex options, for example:

If my pattern is:

regex_pattern = '\b%s\b' % original_word

this would solve the problem as \b represents 'word boundary'. But it doesn't seem to work.

I tried to find to find a work around it by doing:

pattern = re.compile(r'([^\w])%s' % original_word)

but that does not work. For example :

original_word = 'to'
err = 'vo'
sentence = 'I will go tomorrow to the'

it replaces it to:

> I will go vomorrow to the 

Thank you, any help appreciated

Codious-JR
  • 1,658
  • 3
  • 26
  • 48
  • 1
    Note: use [`re.escape()`](https://docs.python.org/3/library/re.html#re.escape) on the `original_word` in case the string ever contains any characters that have special meaning in the regular expression language. – dsh Aug 20 '15 at 16:15
  • @dsh thank you. useful tip – Codious-JR Aug 20 '15 at 21:13

1 Answers1

3

See here for an example of word boundaries in python re module. It looks like you were close just need to put it all together. The following script gives you the output you want...

import re

original_word = 'to'
err = 'vo'
sentence = 'I will go tomorrow to the'

pattern = re.compile(r'\b%s\b' % re.escape(original_word)) 
modified_sentence = re.sub(pattern, err, sentence, count=1)

print modified_sentence

Output --> I will go tomorrow vo the

Community
  • 1
  • 1
abaldwin99
  • 903
  • 1
  • 8
  • 26
  • Thanks alot worked brilliantly. Yea i was almost there! It was probably the combination of the %s syntax that i had not tried with the \b in the pattern. – Codious-JR Aug 21 '15 at 08:36