0

I am trying to add the html <b> element to a list of words in a sentence. After doing some search I got it almost working, except the ignore-case.

import re

bolds = ['test', 'tested']  # I want to bold these words, ignoring-case
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."

pattern = r'\b(?:' + "|".join(bolds) + r')\b'
dict_repl = {k: f'<b>{k}</b>' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group(), m.group()), text)
print(text_bolded)

Output:

Test lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur TEST adipiscing elit <b>test</b>.

This output misses the <b> element for Test and TEST. In other words, I would like the output to be:

<b>Test</b> lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur <b>TEST</b> adipiscing elit <b>test</b>.

One hack is that I explicitly add the capitalize and upper, like so ...

bolds = bolds + [b.capitalize() for b in bolds] + [b.upper() for b in bolds]

But I am thinking there must be a better way to do this. Besides, the above hack will miss words like tesT, etc.

Thank you!

tikka
  • 493
  • 1
  • 4
  • 17
  • I think you can probably just ad `re.I` as the last parameter in your `re.sub` function – Alexander Mar 07 '23 at 00:09
  • @Alexander Actually I had tried `re.I` and `re.IGNORECASE`, but unfortunately they do not give the desired result. In fact if I do that, it makes it worse (it misses the last `test` in `... elit test.`! I am not too savvy on `re` but have tried a lot of things but no luck :( – tikka Mar 07 '23 at 00:17
  • This is a common mistake. You need `flags=re.I`. If you don't use `flags=`, you're setting the next positional argument, which is the max number of replacements. – Barmar Mar 07 '23 at 00:29
  • @Barmar great point, but unfortunately even that did not work. Just to clarify I did this `text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group(), m.group()), text, flags=re.I)` and it still did not give the desired result. Wondering if I am still missing something? – tikka Mar 07 '23 at 00:38
  • 1
    `dict_repl.get(m.group().lower(), m.group())` – Barmar Mar 07 '23 at 00:42
  • @Barmar The issue with this is that it also makes the original text lower, which is not ideal. Although, I can do this and then explicitly get the placement indices of the `` element and copy it over to the original sentence (to maintain the original casing). – tikka Mar 07 '23 at 00:47
  • Why are you using a dictionary? You're replacing everything with the same thing. – Barmar Mar 07 '23 at 00:49
  • @Barmar I am just adding the html `` element, like `k: f'{k}'` to basically bold those words. – tikka Mar 07 '23 at 00:51

1 Answers1

2

There's no need for the dictionary or function. All the replacements are simple string wrapped around the original string, you can get that with a back-reference.

Use flags=re.I to make the match case-insensitive.

text_bolded = re.sub(pattern, r'<b>\g<0></b>', text, flags=re.I)

\g<0> is a back-reference that returns the full match of the pattern.

Barmar
  • 741,623
  • 53
  • 500
  • 612