1

I have a CSV file with various accented entries and other symbols. When I open it with Mac's TextEdit, they show nicely, but when I open it with Excel or on Python using pandas, the different characters get all messed up.

As an example, 4º Centenário becomes 4¼ Centenrio.

What's the encoding I should use (on pandas specifically) to read these special characters properly?

Pedro Carvalho
  • 565
  • 1
  • 6
  • 26
  • whichever encoding you want, just make sure it's the **SAME** encoding on both sides. – Marc B Feb 03 '16 at 20:37
  • Right but I want to know what the encoding actually _is_, because I need to be able to read those characters from the file and I don't seem to be able to. – Pedro Carvalho Feb 03 '16 at 20:40
  • start with the basics: assume utf8 – Marc B Feb 03 '16 at 20:41
  • I've assumed utf8, various parts of iso-8859, and cp860 (the document is in Portuguese) to no avail. – Pedro Carvalho Feb 03 '16 at 20:44
  • win-1252? it's extremely hard to reliably detect char encodings, because any given file could be completely valid in multiple charsets, yet have totally different renderings. – Marc B Feb 03 '16 at 20:45
  • It's really just a regular mac os file, using the mac ways of accenting things (e.g. option+e then vowel for an acute accent). – Pedro Carvalho Feb 03 '16 at 20:48
  • Have you tried chardet? It's a python module that tries to detect encoding. https://pypi.python.org/pypi/chardet – Håken Lid Feb 03 '16 at 20:48
  • relevant? https://discussions.apple.com/thread/146874?start=0&tstart=0 – Marc B Feb 03 '16 at 20:49
  • Possible duplicate of [Python: Is there a way to determine the encoding of text file?](http://stackoverflow.com/questions/436220/python-is-there-a-way-to-determine-the-encoding-of-text-file) – Håken Lid Feb 03 '16 at 20:52

0 Answers0