1

I am trying to use pandas.read_csv to get data from some .csv files. This works fine as long as there is no accent (e.g. ä,é,ü) in the file name or file path. As soon as I use a file name such as düm1.csvI get the following error: OSError: Initializing from file failed. My code is:

dum1 = pd.read_csv(r"C:\Users\MyName\Desktop\dumm12\düm1.csv", sep = ";", decimal = ",", encoding = "utf-8")

I am using pandas 0.20.1 and python 3.6.0. I have found that this has been an issue in previous versions but I thought it had been resolved. Any ideas on how to fix this? I also found this: https://github.com/pandas-dev/pandas/issues/15086

output of pd.show_versions():

INSTALLED VERSIONS commit: None python: 3.6.0.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.20.1 pytest: 3.0.5 pip: 9.0.1 setuptools: 27.2.0 Cython: None numpy: 1.11.3 scipy: 0.18.1 xarray: None IPython: 5.2.2 sphinx: 1.5.1 patsy: 0.4.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: 1.2.0 tables: 3.2.2 numexpr: 2.6.2 feather: None matplotlib: 2.0.0 openpyxl: 2.4.1 xlrd: 1.0.0 xlwt: 1.2.0 xlsxwriter: 0.9.6 lxml: 3.7.3 bs4: 4.5.3 html5lib: 0.999 sqlalchemy: 1.1.5 pymysql: None psycopg2: None jinja2: 2.9.5 s3fs: None pandas_gbq: None pandas_datareader: None

Cactus
  • 864
  • 1
  • 17
  • 44
  • I can reproduce this error with Python 3.6.1, Pandas 0.20.1; however I did not have it until yesterday while working with Python 3.4.4 and Pandas 0.18.1. – elzell May 31 '17 at 09:37
  • That is weird. Might it be a bug in new versions and I should downgrade? – Cactus May 31 '17 at 12:19

3 Answers3

4

I had a similar problem. It's look like the problem occurs with pandas.read_csv with Python 3.6 in a Windows system.

Python 3.6 change Windows filesystem encoding from "mbcs" to "UTF-8". See Python PEP 529. You can use the command sys.getfilesystemencoding() to get the current file system encoding

I get two solutions around this:

1.- Use this code to change all the app to works with the prior Python <= 3.5 encoding ("mbcs")

import sys
sys._enablelegacywindowsfsencoding()

2.- Pass a file pointer to the pandas.read_csv

with open("C:\Users\MyName\Desktop\dumm12\düm1.csv", 'r') as fp:
        dum1 = pd.read_csv(fp, sep = ";", decimal = ",", encoding = "utf-8")

You can see this post: pandas.read_csv can't import file with accent mark in path

jmejias
  • 138
  • 10
1

I tested the name in creating a fake file 'düm1.csv'.

when I run :

df = pd.read_csv('düm1.csv',sep=';')

I haven't an OSError and the file is open in my Ipython.

   Unnamed: 0  test1  test2  test3  tes4
0         NaN    1.0    2.0    3.0   4.0
1         NaN    NaN    NaN    NaN   NaN
2         NaN    NaN    NaN    NaN   NaN
3         NaN    NaN    NaN    NaN   NaN
4         NaN    NaN    NaN    NaN   NaN

Have you tried without encoding ? Without accent ?

C.

Chris PERE
  • 722
  • 7
  • 13
  • Thanks, it is weird that it works for you. I have tried without encoding but it does not work either. However, it works without accents. Any other ideas? Thank you for testing, this way I know it is not a general issue but rather related to my code, machine or package version. – Cactus May 30 '17 at 08:32
  • You're welcome. I'm using python 3.6.1 and ipython 5.3.0. You can try with encoding='ISO-8859-1' – Chris PERE May 30 '17 at 08:37
  • I tried with the encoding ISO-8859-1 but it did not work either. I will update python and ipython and hope it does help. – Cactus May 30 '17 at 08:39
  • I tried with update python and ipython. It does not work with accents. It seems I have to live with the status quo. – Cactus May 30 '17 at 09:26
  • Maybe it's depending of the language of your computer. I don't know, but if it you change the name and it's work finally you can use your data, the problem don't arise from your data or pandas. Good luck ! – Chris PERE May 30 '17 at 09:34
  • Yes, but sometimes I need to use files in a directory for which in can not change the name. But thank you :) – Cactus May 30 '17 at 09:38
0

The issue hasn't been resolved till now. Wait till a PR. Or try it with Python 2.7 I guess that might work

vutsuak
  • 64
  • 8