3

I try to understand how works numpy.getfromtxt method and io.StringIO. On the officical website(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt) I found some examples. Here is one of them:

s = StringIO("1,1.3,abcde")
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),('mystring','S5')], delimiter=",")

But when I run this code on my computer I get: TypeError: must be str or None, not bytes

Tell me please how to fix it?

Alex Rozhnov
  • 199
  • 3
  • 12
  • I can even write smth like this: s = StringIO("1,1.3,1.4") data = np.genfromtxt(s, delimiter=",") and it will not work again. – Alex Rozhnov Feb 10 '18 at 17:09

2 Answers2

1

Consider upgrading numpy because for the current version of numpy, your code just works as written. See the mention in 1.14.0 release note highlights and the section Encoding argument for text IO functions for the relevant changes in np.genfromtxt.

For older numpy, you use a string object for the input but the docs you linked say:

Note that generators must return byte strings in Python 3k. 

So do what the docs say and give it a byte string:

import io
s = io.BytesIO(b"1,1.3,abcde")
wim
  • 338,267
  • 99
  • 616
  • 750
  • I was a bit surprised that the `StringIO` worked, which is why I included the version number. I had a vague memory that previously I had to use `BytesIO`. – hpaulj Feb 10 '18 at 17:17
  • It's all a bit confusing with the misnomer genfrom**txt** and the symbols in the dtype 'S5' vs 'U5' – wim Feb 10 '18 at 17:18
  • Though the use of 'U5' vs 'S5' has nothing to do with the nature of the input string. If given a filename `genfromtxt` opens the file in `rb` mode, and tries to work with bytestrings through out (for consistency with Py2 behavior). – hpaulj Feb 10 '18 at 17:22
  • Recently I was seeing the `VisibleDeprecationWarning`, but was ignoring it. This is a welcomed upgrade. – hpaulj Feb 10 '18 at 17:40
1
In [200]: np.__version__
Out[200]: '1.14.0'

The example works for me:

In [201]: s = io.StringIO("1,1.3,abcde")
In [202]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[202]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

It also works for a byte string:

In [204]: s = io.BytesIO(b"1,1.3,abcde")
In [205]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[205]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

genfromtxt works with anything that feeds it lines, so I usually use a list of bytestrings directly (when testing questions):

In [206]: s = [b"1,1.3,abcde"]
In [207]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[207]: 
array((1, 1.3, b'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

Or with several lines

In [208]: s = b"""1,1.3,abcde
     ...: 4,1.3,two""".splitlines()
In [209]: s
Out[209]: [b'1,1.3,abcde', b'4,1.3,two']
In [210]: np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
     ...: ... ('mystring','S5')], delimiter=",")
Out[210]: 
array([(1, 1.3, b'abcde'), (4, 1.3, b'two')],
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', 'S5')])

It used to be that with dtype=None, genfromtxt created S strings.

NumPy dtype issues in genfromtxt(), reads string in as bytestring

With 1.14, we can control the default string dtype:

In [219]: s = io.StringIO("1,1.3,abcde")
In [220]: np.genfromtxt(s, dtype=None, delimiter=",")
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
  #!/usr/bin/python3
Out[220]: 
array((1, 1.3, b'abcde'),
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', 'S5')])
In [221]: s = io.StringIO("1,1.3,abcde")
In [222]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[222]: 
array((1, 1.3, 'abcde'),
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])

https://docs.scipy.org/doc/numpy/release.html#encoding-argument-for-text-io-functions

Now I can generate examples with Py3 strings without producing all those ugly b'string' results (but got to remember that not everyone has upgraded to 1.14):

In [223]: s = """1,1.3,abcde
     ...: 4,1.3,two""".splitlines()
In [224]: np.genfromtxt(s, dtype=None, delimiter=",",encoding=None)
Out[224]: 
array([(1, 1.3, 'abcde'), (4, 1.3, 'two')],
      dtype=[('f0', '<i4'), ('f1', '<f8'), ('f2', '<U5')])
hpaulj
  • 221,503
  • 14
  • 230
  • 353