3

I am developing an application with Python and a QT GUI. I need to import a file to a DataFrame. I use a QFileDialog.getOpenFileName to get the path and filename to open it with pandas.read_csv method. Everything works well until I get a path with special characters like "ó". The pandas.read_csv doesn't work and crash the app.

I try to reproduce the error in console and have the following results:

In[2]: import pandas as pd
Backend Qt5Agg is interactive backend. Turning interactive mode on.

In[3]: path1 = 'F:/Software_Proyects/Python/Proyectos/test_read_csv/FlowData.txt'
In[4]: df1 = pd.read_csv(path1, delim_whitespace=True, dtype=object)

In[5]: path2 = 'F:/Software_Proyects/Python/Proyectos/test_read_csv_with_ó/FlowData.txt'
In[6]: df2 = pd.read_csv(path2, delim_whitespace=True, dtype=object)
Traceback (most recent call last):
  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-feba8e024d43>", line 1, in <module>
    df2 = pd.read_csv(path2, delim_whitespace=True, dtype=object)
  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 730, in __init__
    self._make_engine(self.engine)
  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas\parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:4184)
  File "pandas\parser.pyx", line 669, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:8471)
OSError: Initializing from file failed

the output of show_versions() is:

In[7]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.0.final.0
python-bits: 32
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

As I read in this post Encoding with pandas.read_csv when file name has accents the problem was fixed in pandas 0.14.0.

Any recommendation to solve this problem?

chowsai
  • 565
  • 3
  • 15
jmejias
  • 138
  • 10
  • Try maybe `pd.read_csv(path1, delim_whitespace=True, dtype=object,encoding='utf-8')` or some other from this list: https://docs.python.org/3/library/codecs.html#standard-encodings – Protostome Sep 10 '17 at 11:50
  • you might want to check [this issue](https://github.com/pandas-dev/pandas/issues/15086) – MaxU - stand with Ukraine Sep 11 '17 at 21:57
  • 1
    @Protostome, thanks for your response, I tried it but it don't work. I think because the encoding option of read_csv is for file content and not for the path to the file. The file is imported without problem, the problem comes when the path of the file has special characters – jmejias Sep 13 '17 at 11:03
  • 1
    Thanks @MaxU. Your comment point me in the direction to find a solution I will post below. – jmejias Sep 14 '17 at 11:00

2 Answers2

3

Looking in deep, this behavior comes in a combination of Python 3.6 and pandas.read_csv only in Windows systems.

Python 3.6 change Windows filesystem encoding from "mbcs" to "UTF-8". See Python PEP 529. Use sys.getfilesystemencoding() to get the current file system encoding

I get some solutions around this:

1.- Use this code to change all the app to works with the prior Python <= 3.5 encoding ("mbcs")

import sys
sys._enablelegacywindowsfsencoding()

2.- Pass a file pointer to the pandas.read_csv

with open(path2, 'r') as fp:
    df2 = pd.read_csv(fp, delim_whitespace=True, dtype=object)
jmejias
  • 138
  • 10
0

you can try those lines of code in your notebook/ipython before reading with utf-8 encoding :

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

and then when reading your file use those line as suggest in the comment

pd.read_csv(path1, delim_whitespace=True, dtype=object,encoding='utf-8')
Espoir Murhabazi
  • 5,973
  • 5
  • 42
  • 73
  • Hello @Espoir. I tried this commands but it doesn't work, looking why I found this commands is for Python 2. In Python 3 the default encoding is 'utf-8' and the setdefaultencoding() method was deleted from sys module. To use reload(sys) on Python >= 3.4 I follow this example:https://stackoverflow.com/questions/961162/reloading-module-giving-nameerror-name-reload-is-not-defined – jmejias Sep 14 '17 at 10:52