import re
data2 = ''
file = open('twitter.txt', 'r')
for i in file:
thing = re.sub(r'[^\x00-\x7f]',r'', str(file[i]))
print(str(thing))
Hi, I'm very new to Python. After scraping a bunch of data from Twitter using Python, I put the data into a text file. The text file ends up with a lot of emojis and other non-ASCII characters that can't be turned into a String. The above code is my attempt to remove the non-ASCII characters and turn the file into a String, but it ends up giving me the error:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1607: character maps to <undefined>
How can I remove the non-ASCII characters then turn the remaining text into a String?