0

I am reading a csv which has the following format ID,Name,Description

Now some of my description contain foriegn langage characters like enter image description here,enter image description here

I would like to read this file into dataframe if possible. When I am using read_csv etcI am getting encoding errors. I tried

csv = pd.read_csv('foo.csv',encoding='utf-8')

but it throws encoding error

'charmap' codec can't decode byte 0x9d

Is there any way to read the file keeping character set and keep analysing. If not what would be the way to get such file into a dataframe or array like structure?

If keeping characters is not possible, then can such lines/words be ignored and read rest of the data? Help appreciated.

EdChum
  • 376,765
  • 198
  • 813
  • 562
Yantraguru
  • 3,604
  • 3
  • 18
  • 21
  • Is it literally like this `Mâ€lnda, Sâ€dert„lje,`? If so, either your editor (where you are viewing the file) or the file itself is not encoded correctly. – Burhan Khalid Apr 14 '15 at 05:55
  • Yea realized that. I was using editplus to view it. Now I have taken screenshot after opening it in excel as I did for eastern chars. – Yantraguru Apr 14 '15 at 05:59
  • Try this: http://stackoverflow.com/questions/904041/reading-a-utf8-csv-file-with-python?rq=1 – Phuc Tran Apr 14 '15 at 06:09
  • Are you sure its UTF-8? It seems like it isn't. You need to figure out what encoding is being used. – Burhan Khalid Apr 14 '15 at 06:36
  • @BurhanKhalid I am not sure and as its erroring out its not utf8. I am trying to understand how I can figure out the encoding. – Yantraguru Apr 14 '15 at 06:40
  • Post raw input from your csv into the question (not as a comment) or a link to the file, if you're lucky the file will have a [BOM](http://en.wikipedia.org/wiki/Byte_order_mark) which will indicate the encoding – EdChum Apr 14 '15 at 09:56
  • @EdChum, will check that. Unfortunately, I wont be able to post the link of file as it is because some legal stuff. But I'll check if I can do something to make it postable – Yantraguru Apr 14 '15 at 10:01

0 Answers0