Multiple Replacements Honoring Capitalization and Punctuation

Question

I'm attempting to create a script that replaces keywords found in body with <strong> tags. <strong>keyword</strong>

The problem is words can be capitalized or have punctuation like dog's toy (whereas I'd be aiming for dog, the strong tag would cut off 's)

I'm trying to create an efficient and dynamic replacer that works as described.

So far this is what I have:

def strongwords(text, dict):
    rc = re.compile('|'.join(map(re.escape, dict)))
    def translate(match):
        return dict[match.group(0)]
    return rc.sub(translate, text)

Unless I create a HUGE dict with every possibility (as described above) and expend resources on find/replace, I do not see this working as desired.

Is there a pythonic way to do this with a proven regex recipe? Perhaps a package or module has been designed for this?

[Do not try to parse HTML with regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — Cory Kramer, Aug 13 '15 at 23:27
So your keyword is "dog", but you want it to be smart enough to match "dog's"? — Cyphase, Aug 13 '15 at 23:33

score 1 · Answer 1 · answered Aug 13 '15 at 23:51

The first problem of case can be resolved with the re.IGNORECASE flag. The second is a bit more complicated but an optional match group could work.

rc = re.compile("(%s)('s|re)?" % ('|'.join(map(re.escape, dict)), re.IGNORECASE)

That pattern would ignore case and allow any of the keywords to be suffixed with 's or 're (add more as necessary). The optional match group would need to be joined back by with the keyword match but that should be easy enough

Multiple Replacements Honoring Capitalization and Punctuation

1 Answers1