1

Beginner - have been self learning over last 12 months to use Terminal (on Mac OSX10.10.5), Unix basics, R, Python, and python associated modules and applications. Using Python 3.4.3 |Anaconda 2.1.0 (x86_64).

I am working through the numpy-user-1.9.1.pdf (https://docs.scipy.org/doc/numpy/numpy-user-1.9.1.pdf). This appears the key resource to teach oneself more about NumPy.

In the Numpy User guide, section 2.3.1 Importing data with genfromtxt; the initial set up to proceed is:

import numpy as np

from StringIO import StringIO

Attempts to copy and duplicate the examples that follow in this section failed because the function 'StringIO' is not recognised.

With web searching, I have confirmed my suspicion that since the user guide was written the StringIO function has been dropped in later versions of python, here v3.4, with a different function io?, io.string?

Question: What line should I be using in its place for python 3.4 to proceed with this user guide and examples?

My attempts to use this other function in place of StringIO in various ways has not worked, so I remain stuck in continuing the self training exercise for this section. It does not help that I do not fully understand quite what the line "from StringIO import StringIO" is doing, and hence why it is required. (A very basic understanding I suspect I should have - so red face here likely.)

As alternative there is a Numpy Tutorial listed on the site http://www.numpy.org/, however a click on that link gives me a Forbidden page.. stating ..."You don't have permission to access NumPy_Tutorial on this server."

I have looked for other resources as a work around, and hence open to an alternative doc, more up to date, if it is known - though after 24 hours have decided to post this as a question.

Seeking to learn NumPy so that I can import various data files so I can review data from a project in various plots using matplotlib. Most files are .csv files. I am aware of a more specific python module just for .csv files however I feel a need to be more informed and flexible for different data files in future, hence understanding how to use NumPy and be able to taylor specifics for each import data file seems the correct way to go to achieve future generalised competency.

skytux
  • 1,236
  • 2
  • 16
  • 19
Cam_Aust
  • 2,529
  • 2
  • 19
  • 28

3 Answers3

2

You're right that StringIO is no longer available in Python3. It's been replaced by the io module.

Instead of this in Python 2:

import numpy as np
from StringIO import StringIO
data = "1, 2, 3\n4, 5, 6"
np.genfromtxt(StringIO(data), delimiter=",")

Use io.BytesIO and make sure your data is converted to a bytes type in Python 3. We use encode() to convert from a str type to a byte type:

import numpy as np
import io
data = "1, 2, 3\n4, 5, 6"
np.genfromtxt(io.BytesIO(data.encode()), delimiter=",")

The reason why we have to do it this way is because Python3 overhauled how str works. In Python 2, you could put bytes into the str type, because there was no byte type - only bytearray. See this post for more information.

Anarosa PM
  • 141
  • 1
  • 4
  • Not sure how I acknowledge that the responses have been most helpful and appreciated. Two posts, hpaulj and Anarosa PM solve my issue. As a relative beginner Anarosa PM's response presenting python 2 and python 3 equivalent code was particularly clear. The links useful, all followed and read. All valued. – Cam_Aust Sep 30 '15 at 03:22
1

the function StringIO moved from the "StringIO" package (confusing that they are named the same) to the "io" package. So, just replace "from StringIO import StringIO" with

from io import StringIO
DanHickstein
  • 6,588
  • 13
  • 54
  • 90
1

You don't need to use the StringIO route; genfromtxt works just as well from a list of strings. The first example in that pdf:

(IPython and Python3.4)

In [92]: data=b"""1,2,3
4,5,6"""
In [93]: data=data.splitlines()
In [94]: data
Out[94]: [b'1,2,3', b'4,5,6']
In [95]: np.genfromtxt(data,delimiter=',')
Out[95]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In Python3 you need to use byte strings, because genfromtxt assumes it is reading a file (or iterable) in byte, as opposed to unicode mode.

(import io)
In [108]: data=b"1,2,3\n4,5,6"
In [109]: np.genfromtxt(io.BytesIO(data),delimiter=',')
Out[109]: 
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

There are lots of SO questions about genfromtxt and loading csv files in numpy. You may learn as much from those as from any tutorial.

hpaulj
  • 221,503
  • 14
  • 230
  • 353