Importing a file from a subfolder with read_csv : how to get it to work with engine='c' ? (UnicodeDecodeError)

Question

I am trying to use pandas to read a csv file which is in a sunfolder of the current folder. I am on a Windows PC.

If I run:

df=pd.read_csv("subfolder//file.csv")

I get:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 16: invalid start byte

If I run:

df=pd.read_csv("subfolder//file.csv", engine='python')

It works.

Why????
Isn't there a way to use c as the engine? It's meant to be faster

Could you csv file contain a SUPERSCRIPT TWO character U+00B2 `²`? If the answer is yes, it is probably Latin1 or cp1252 encoded... — Serge Ballesta, Mar 20 '19 at 11:40

Farhood ET · Accepted Answer · 2019-03-20T17:50:26.900

1

This might be because read_csv is trying to read the file in "UTF-8" format while your file is clearly in a different format. To detect the encoding in Windows, you can look at this. Get encoding of a file in Windows

After you found out the file's encoding format, you can give an argument of the encoding type to the read_csv function. e.g.

df=pd.read_csv("subfolder//file.csv", encoding="utf-8")

edited Mar 20 '19 at 17:50

answered Mar 20 '19 at 11:13

Farhood ET

1,432
15
32

1

So this means that engine='c' causes encoding to default to 'utf-8', while engine='python' means a different encoding? I double checked https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html and none of this seems to be documented explicitly - as is unfortunately all too common in the beautiful world of Python... – Pythonista anonymous Mar 20 '19 at 15:33
@Pythonistaanonymous I don't know about the current situation explicitly, but the error you are getting is an error of encoding conflict, and I suspect this might be the case. Have you checked your file's encoding yet? – Farhood ET Mar 20 '19 at 17:50
1

Yes, if I set encoding='latin1' it works. Thanks for the help. PS still frustrated at the how much documentation sucks in the world of Python! – Pythonista anonymous Mar 20 '19 at 17:52

Importing a file from a subfolder with read_csv : how to get it to work with engine='c' ? (UnicodeDecodeError)

1 Answers1