0

read_csv returns this error: Initializing from file failed using latin-1, utf-8 and ISO-8859-1.

from pandas import *

df = dict(A=[1,2,3],B=['abc','efg','hig'],C=[100,200,300])
df = DataFrame(df)   
df

    A    B    C
0   1   abc  100
1   2   efg  200
2   3   hig  300

I wrote it with a German character as following:

df.to_csv('Lück.txt', sep='\t', encoding ='utf-8', index=False)

and it fails to import it in this way:

read_csv('Lück.txt', sep='\t', encoding = 'utf-8')

After all if there is ant method to detect special German character I would replace it.

P.S. I have seen number of posts in this issue, but none of them coincides with my question and I'm not good at standard character decode/encoding, thanks.

Shivid
  • 1,295
  • 1
  • 22
  • 36

1 Answers1

1

Chances are that the root cause is not the German umlaut, but one or more "weird" whitespace characters within the .csv file. Those especially occur when the .csv file was modified by any kind of copy/paste operation to excel beforehand.

First, begin your python script like this:

#!/usr/bin/env python 
# -*- coding: utf-8 -*

Second, make sure that your .csv files do not contain any kind of weird whitespaace characters, as summarized here.

sudonym
  • 3,788
  • 4
  • 36
  • 61
  • Thanks even though the problem isn't resolved. As you say it could be caused by read/write operations because originally this file has been created by the data extracted from `xlsx` file. – Shivid Nov 07 '17 at 09:41