Which encoding to use while reading a csv file with pandas?

Asked Dec 22 '20 at 15:28

Active Dec 22 '20 at 16:05

Viewed 69 times

How do we know which encoding type (latin1, utf-8 etc.) to use with what type of csv file? I have a csv file that has numbers, and column titles in English, and when I used latin1 as my encoder:

pandas.read_csv("Data.csv",encoding='latin1')

the file was read correctly. I just want to know how do we determine which encoding type to use as I had to get my code to work by trial and error.

edited Dec 22 '20 at 16:05

desertnaut

57,590
26
140
166

asked Dec 22 '20 at 15:28

Ethan

1

see https://stackoverflow.com/questions/269060/is-there-a-python-library-function-which-attempts-to-guess-the-character-encodin and https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text – Stef Dec 22 '20 at 15:31
TL;DR: *You should know.* It should be UTF-8, the de facto modern standard. If it's not UTF-8, it should preferably be indicated in the file name. If it's not indicated in the filename, somebody should tell you in some way. If nobody told you: good luck. – deceze Dec 22 '20 at 16:10
what if I am creating my own csv file, by saving an excel sheet in a csv type format on my windows 10 laptop.....is there a set encoding for it or do I have to find it using python? – Ethan Dec 22 '20 at 16:24
Can/can’t you choose the encoding during export…? – deceze Dec 22 '20 at 16:56
I checked it. When I chose to save my excel file as csv it gave me the option of doing it as csv utf-8 file, but there was no option for exporting it as latin1. – Ethan Dec 22 '20 at 22:17
Any reason why you’d *need* latin1 in particular? – deceze Dec 23 '20 at 05:39

Which encoding to use while reading a csv file with pandas?

0 Answers0