Python challenge of the day. I am reading an input file with a formatted text (spaces, new lines, ponctuation). I would like to preserve the text as it is, while highlighting certain words based on a some condition.
Then console print the text with color-highlighted words in it.
Code here, first the set of words which should be highlighted.
diff=set(g_word_counts.keys()).difference(set(t_word_index.keys()))
To compare words in text with this set I lower() it, this gives
colored_text=""
for t in generated_text.lower().split():
if t in diff:
colored_text+=colored(t, 'green')
else:
colored_text+=t
colored_text+=" "
print(colored_text)
where the result has obviously everything lower case which is not exactly nice. Additionally, I would like to split not only on white space, but also any punctuation character, where I try according to Splitting the sentences in python
import re
def to_words(text):
return re.findall(r'\w+', text)
but here again it will lowercase everything and reconstruct the text without its punctuation.
An elegant efficient manner to keep the formatting unchanged, color and print?
Bonus: is there a way to print to a text file highlighting words nicely? For now this gives un-nice
[32mdémoniais[0m en la foule ou la couronnement à [32ml’enchance[0m de la piété,
pour cette fois de ce qui a simule comme le capital et départ du [32mdépour[0m de la [32msubissement[0m