1

I used Python2.7.10 before. Recently I change to python 3.6. However, when I want to import csv files it fails. My simple code is like this and I think it should work well in Python2.

data = pd.read_csv('data.csv')

And the error returns like:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

What does this mean and how can I solve this problem? Thanks.

Update

I've already solved it adding something like this:

data = pd.read_csv(data.csv',sep='\t',encoding='utf-16')

Although I still don't know why it works, thanks for your help anyway.

Community
  • 1
  • 1
Truefan
  • 21
  • 1
  • 3
  • What is the encoding of your CSV file? Can you show a small sample CSV demonstrating the problem? – BrenBarn Feb 03 '18 at 08:06
  • try passing `encoding = "ISO-8859-1"` as a parameter to `read_csv` – Vivek Kalyanarangan Feb 03 '18 at 08:13
  • this is the list of endocings.use the one suits your data. __ [unicodes link](https://docs.python.org/3/library/codecs.html#standard-encodings) – A_emperio Feb 03 '18 at 08:18
  • Sorry. Actually I have no idea what my CSV file encoding is. How can I check the encoding type? In fact, the data is downloaded from CSMAR, if you guys know. – Truefan Feb 03 '18 at 08:37
  • Possible duplicate of ['utf-8' codec can't decode byte 0x92 in position 18: invalid start byte](https://stackoverflow.com/questions/46000191/utf-8-codec-cant-decode-byte-0x92-in-position-18-invalid-start-byte) – Sociopath Feb 03 '18 at 09:32

1 Answers1

0

I just had this problem. This post helped me

'utf-8' codec can't decode byte 0x92 in position 18: invalid start byte

and my encoding ended up being Windows codepage 1252

read utf-8 CSV file into dataframe

but your encoding could be anything...

MissBleu
  • 175
  • 2
  • 15