0

Python challenge of the day. I am reading an input file with a formatted text (spaces, new lines, ponctuation). I would like to preserve the text as it is, while highlighting certain words based on a some condition.

Then console print the text with color-highlighted words in it.

Code here, first the set of words which should be highlighted.

diff=set(g_word_counts.keys()).difference(set(t_word_index.keys()))

To compare words in text with this set I lower() it, this gives

colored_text=""
for t in generated_text.lower().split():
    if t in diff:
        colored_text+=colored(t, 'green')
    else:
        colored_text+=t
    colored_text+=" "
    
print(colored_text) 

where the result has obviously everything lower case which is not exactly nice. Additionally, I would like to split not only on white space, but also any punctuation character, where I try according to Splitting the sentences in python

import re

def to_words(text):
    return re.findall(r'\w+', text)

but here again it will lowercase everything and reconstruct the text without its punctuation.

An elegant efficient manner to keep the formatting unchanged, color and print?

Bonus: is there a way to print to a text file highlighting words nicely? For now this gives un-nice

 [32mdémoniais[0m en la foule ou la couronnement à [32ml’enchance[0m de la piété, 
 pour cette fois de ce qui a simule comme le capital et départ du [32mdépour[0m de la [32msubissement[0m
kiriloff
  • 25,609
  • 37
  • 148
  • 229
  • What is "colored"? Can't you just do `if t.lower() in diff`? The text file will show the ANSI escape codes if viewed outside of an ANSI-supporting terminal, there's nothing you can do about that. – Luatic Feb 28 '22 at 09:51
  • Check `blessings`: https://github.com/erikrose/blessings It do the right thing, in the correct way, unlike nearly all other libraries: It doesn't hard code colour code (they varies from terminals), it handles non-terminal natively (e.g. if you redirect the output to a file), etc. And it is a short file which use `curses`, you can copy code or just check how it do things (so using just standard library). – Giacomo Catenazzi Feb 28 '22 at 10:45
  • @LMD please see updated question – kiriloff Feb 28 '22 at 11:21
  • @GiacomoCatenazzi please see update – kiriloff Feb 28 '22 at 11:21
  • Note: your "un-nice" is just that you read control code as characters and you do not feed them to terminal (which will interpret ESC sequences (really: control sequences) as colours. For the real question: I'll check later if I find something elegant, else just lower() to a copy but print the original. (so remove `lower` in the for, but add `t_low = t.lower()` (...) as first line of the loop – Giacomo Catenazzi Feb 28 '22 at 12:50

2 Answers2

0

You can use colorama:pip install colorama This is how you import it into your project: from colorama import Fore,Back And this is how you use it: print(f"{Fore.Green}Hello world!{Fore.RESET}"

Tom
  • 486
  • 1
  • 2
  • 11
0

I clean text adding spaces around the punctuation

def clean_input_text(text):

    w = re.sub(r"([?.!,;¿’])", r" \1 ", text)
    w = re.sub(r'[" "]+', " ", w)
    
    return w

then work on the .split() of the text

# color new words
colored_text=""
for t in generated_text.split():
    if t in diff:
        colored_text+=colored(t, 'green')
    else:
        colored_text+=t
    colored_text+=" "

finally I arrange around the punctuation

colored_text = colored_text.replace(" . ", ". ")
colored_text = colored_text.replace(" , ", ", ")
colored_text = colored_text.replace(" ! ", "! ")
colored_text = colored_text.replace(" ? ", "? ")
colored_text = colored_text.replace(" ’ ", "’")
kiriloff
  • 25,609
  • 37
  • 148
  • 229