I'm currently trying to replace specific characters that occur in a string, I've looked through many related posts regarding a similar issue, but the ones I've found just want to remove them entirely.
The full code is supposed to find most common n-grams and word freqs, however the unique character are throwing it off:
This is the code I've written to handle it but its currently not working:
words = ['vãhãn', 'chairã', 'â']
def ASCII_fix(words):
"""
:param words: list of unfiltered words that have different encoding
:return: formatted utf-8 list
"""
for x in words:
word = x
for a in word:
unidecode(a)
return words
words = ASCII_fix(words)
Output: [('â', 'â', 'â'), ('ã', 'â', 'ã'), ('ã', 'ã', 'ã')]
If this is a dupe just let me know, or if there is a handy package that can help with this that'd be great!