1

I am trying to read a file containing data for different dates using numpy.genfromtxt() in python3. The file basically looks like

Date,Open,High,Low,Close,Volume
1-Apr-15,108.33,108.66,108.33,108.66,290

but may contain missing values marked as -.

The following code works fine in python2

str2date = lambda x: datetime.strptime(x, '%d-%b-%y').strftime('%Y-%m-%d')
data = np.genfromtxt('test.dat', dtype="S9,f8,f8,f8,f8,f8", delimiter=',', names=True,  missing_values='-', converters={0: str2date})

but fails in python3 with

UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

locale.getpreferredencoding(False) returns UTF-8 as the default encoding and the suggested solution by setting the encoding for the input stream suggested for example here is a bit tricky. I also tried setting the encoding of the terminal without success. I also have to admit, that I do not see a solution to my problem in this answer as there are no special characters contained in the file -- or at least I do not see them.

How can I solve this issue without stepping back to python2?

Martin
  • 201
  • 3
  • 6
  • 1
    It seems that genfromtxt falls in ascii mode for undefined reason.... have you tried genfromtxt(open('test.dat', encoding='utf-8'), ... ? or more efficient, pandas.read_csv ? – B. M. Dec 03 '17 at 15:11
  • 2
    `genfromtxt(open('test.dat', encoding='utf-8'))` complains about bytes provides instead of a string. But pandas works like a charm. Thanks :). It you put that in an answer I'll accept it. – Martin Dec 03 '17 at 16:34
  • `genfromtxt` opens the file in binary mode, and works with bytestrings (Py3). The `converters` solution in https://stackoverflow.com/questions/33001373/loading-utf-8-file-in-python-3-using-numpy-genfromtxt doesn't help? – hpaulj Dec 03 '17 at 17:17
  • I understood that as a workaround for a problematic file name. Which I do not have. – Martin Dec 03 '17 at 17:49

1 Answers1

0

When I try to reproduce your code I get problems with the date conversion:

Out[405]: b'1-Apr-15'
In [406]: str2date(_)
---------------------------------------------------------------------------
...
----> 1 str2date = lambda x: datetime.strptime(x, '%d-%b-%y').strftime('%Y-%m-%d')

TypeError: strptime() argument 1 must be str, not bytes

If I add a decode:

def foo(x):
    return str2date(x.decode())

the converter handles the byte string that genfromtxt insists on providing.

In [410]: data = np.genfromtxt('stack47619155.txt', dtype="S9,f8,f8,f8,f8,f8", 
     ...: delimiter=',', names=True,  missing_values='-', converters={0: foo})
In [411]: data
Out[411]: 
array([(b'2015-04-0',  108.33,  108.66,  108.33,  108.66,  290.),
       (b'2015-04-0',     nan,  108.66,     nan,  108.66,  290.),
       (b'2015-04-0',  108.33,  108.66,  108.33,  108.66,   nan)],
      dtype=[('Date', 'S9'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Volume', '<f8')])
In [412]: data = np.genfromtxt('stack47619155.txt', dtype="U9,f8,f8,f8,f8,f8", 
     ...: delimiter=',', names=True,  missing_values='-', converters={0: foo})
In [413]: data
Out[413]: 
array([('2015-04-0',  108.33,  108.66,  108.33,  108.66,  290.),
       ('2015-04-0',     nan,  108.66,     nan,  108.66,  290.),
       ('2015-04-0',  108.33,  108.66,  108.33,  108.66,   nan)],
      dtype=[('Date', '<U9'), ('Open', '<f8'), ('High', '<f8'), ('Low', '<f8'), ('Close', '<f8'), ('Volume', '<f8')])

It's a different error, so I may have used a different - as the missing field marker or not.

You found my post from a couple of years ago with a decode in the converters:

Loading UTF-8 file in Python 3 using numpy.genfromtxt

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • The `decode()` is required in python2. For python3 it throws an error (on my system). The conversion itself without `decode()` runs fine there `print(str2date('1-Apr-15'))`. – Martin Dec 03 '17 at 17:46
  • How about `print(str2date(b'1-Apr-15'))`? – hpaulj Dec 03 '17 at 17:50
  • Then I need the `decode`, true. But `genfromtxt` still fails with the `ascii` problem. – Martin Dec 03 '17 at 17:55