Is there any way to preprocess text files and skip these characters?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa1 in position 1395: invalid start byte
Is there any way to preprocess text files and skip these characters?
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa1 in position 1395: invalid start byte
Try this:
str.decode('utf-8',errors='ignore')
I think your text file have some special character, so 'utf-8' can't decode.
You need to try using 'ISO-8859-1' instead of 'utf-8'. like this:
import sys
reload(sys).setdefaultencoding("ISO-8859-1")
# put your code here