I have a comma-separated .txt
file with French characters such as Vétérinaire
and Désinfectant
.
import pandas as pd
df = pd.read_csv('somefile.txt', sep=',', header=None, encoding='utf-8')
[Decode error - output not utf-8]
I have read many Q&A posts (including this) and tried many different encoding such as 'latin1
' and 'utf-16
', they didn't work. However, I tried to run the exact same script on the different Windows 10 computer with similar Python setup (all Python 3.6), it works perfectly fine in the other computer.
Edit: I tried this. Using encoding='cp1252'
helps for some of the .txt
files I want to import, but for a few .txt
files, it gives the following error.
File "C:\Program_Files_Extra\Anaconda3\lib\encodings\cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 25: character maps to <undefined>
Edit: Trying to identify encoding from chardet
import chardet
import pandas as pd
test_txt = 'somefile.txt'
rawdata = open(test_txt, 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print (charenc)
df = pd.read_csv(test_txt, sep=',', header=None, encoding=charenc)
print (df.head())
utf-8
[Decode error - output not utf-8]