Standard character encodings for pandas.read_csv

Question

read_csv returns this error: Initializing from file failed using latin-1, utf-8 and ISO-8859-1.

from pandas import *

df = dict(A=[1,2,3],B=['abc','efg','hig'],C=[100,200,300])
df = DataFrame(df)   
df

    A    B    C
0   1   abc  100
1   2   efg  200
2   3   hig  300

I wrote it with a German character as following:

df.to_csv('Lück.txt', sep='\t', encoding ='utf-8', index=False)

and it fails to import it in this way:

read_csv('Lück.txt', sep='\t', encoding = 'utf-8')

After all if there is ant method to detect special German character I would replace it.

P.S. I have seen number of posts in this issue, but none of them coincides with my question and I'm not good at standard character decode/encoding, thanks.

Actually it is in a list of names, and the problem raises when variable arrives to special german characters, but here to make an example I wrote it directly as string. — Shivid, Nov 07 '17 at 08:27
For export data there is no problem, `Lück.txt` is being created, the error raises when importing. — Shivid, Nov 07 '17 at 08:37
Again, writes it correctly, gives me this error `'utf-8' codec can't decode byte 0xe0 in position 4: unexpected end of data` — Shivid, Nov 07 '17 at 08:45
In that case, write with `latin-1` encoding and see if it works. — cs95, Nov 07 '17 at 08:46
I have to do it in both writing and reading, is that true? By the way as I ha written above I proved also with that, same error `Initializing from file failed`. — Shivid, Nov 07 '17 at 08:50

score 1 · Answer 1 · answered Nov 07 '17 at 09:25

1

Chances are that the root cause is not the German umlaut, but one or more "weird" whitespace characters within the .csv file. Those especially occur when the .csv file was modified by any kind of copy/paste operation to excel beforehand.

First, begin your python script like this:

#!/usr/bin/env python 
# -*- coding: utf-8 -*

Second, make sure that your .csv files do not contain any kind of weird whitespaace characters, as summarized here.

answered Nov 07 '17 at 09:25

sudonym

3,788
4
36
61

Thanks even though the problem isn't resolved. As you say it could be caused by read/write operations because originally this file has been created by the data extracted from `xlsx` file. – Shivid Nov 07 '17 at 09:41

Standard character encodings for pandas.read_csv

1 Answers1