0

I'm currently trying to replace specific characters that occur in a string, I've looked through many related posts regarding a similar issue, but the ones I've found just want to remove them entirely.

The full code is supposed to find most common n-grams and word freqs, however the unique character are throwing it off:

This is the code I've written to handle it but its currently not working:

words = ['vãhãn', 'chairã', 'â'] 

def ASCII_fix(words):
    """
    :param words: list of unfiltered words that have different encoding
    :return: formatted utf-8 list
    """
    for x in words:
        word = x
        for a in word:
            unidecode(a)
    return words

words = ASCII_fix(words)

Output: [('â', 'â', 'â'), ('ã', 'â', 'ã'), ('ã', 'ã', 'ã')] 

If this is a dupe just let me know, or if there is a handy package that can help with this that'd be great!

Sebastian Goslin
  • 477
  • 1
  • 3
  • 22

0 Answers0