0

How do we know which encoding type (latin1, utf-8 etc.) to use with what type of csv file? I have a csv file that has numbers, and column titles in English, and when I used latin1 as my encoder:

pandas.read_csv("Data.csv",encoding='latin1')

the file was read correctly. I just want to know how do we determine which encoding type to use as I had to get my code to work by trial and error.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Ethan
  • 9
  • 2
  • 1
    see https://stackoverflow.com/questions/269060/is-there-a-python-library-function-which-attempts-to-guess-the-character-encodin and https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text – Stef Dec 22 '20 at 15:31
  • TL;DR: *You should know.* It should be UTF-8, the de facto modern standard. If it's not UTF-8, it should preferably be indicated in the file name. If it's not indicated in the filename, somebody should tell you in some way. If nobody told you: good luck. – deceze Dec 22 '20 at 16:10
  • what if I am creating my own csv file, by saving an excel sheet in a csv type format on my windows 10 laptop.....is there a set encoding for it or do I have to find it using python? – Ethan Dec 22 '20 at 16:24
  • Can/can’t you choose the encoding during export…? – deceze Dec 22 '20 at 16:56
  • I checked it. When I chose to save my excel file as csv it gave me the option of doing it as csv utf-8 file, but there was no option for exporting it as latin1. – Ethan Dec 22 '20 at 22:17
  • Any reason why you’d *need* latin1 in particular? – deceze Dec 23 '20 at 05:39

0 Answers0