With your sample as a list of lines:
In [1]: txt=b"""
...: M,0.475,0.37,0.125,0.5095,0.2165,0.1125,0.165,9
...: F,0.55,0.44,0.15,0.8945,0.3145,0.151,0.32,19
...: """
In [2]: txt=txt.splitlines()
genfromtxt
can load it with dtype=None
:
In [16]: data = np.genfromtxt(txt, delimiter=',', dtype=None)
In [17]: data
Out[17]:
array([(b'M', 0.475, 0.37, 0.125, 0.5095, 0.2165, 0.1125, 0.165, 9),
(b'F', 0.55, 0.44, 0.15, 0.8945, 0.3145, 0.151, 0.32, 19)],
dtype=[('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8'), ('f7', '<f8'), ('f8', '<i4')])
In [18]: data['f0']
Out[18]:
array([b'M', b'F'],
dtype='|S1')
In [19]: data['f3']
Out[19]: array([ 0.125, 0.15 ])
In [20]:
The result is a 1d array (here 2 elements), with many fields, which are accessed by name. Here the first is deduced to be a string, the rest float, except the last integer.
I could be more specific about the dtype
, and define a field with multiple columns
In [21]: data=np.genfromtxt(txt,delimiter=',',dtype=['S3','8float'])
In [22]: data
Out[22]:
array([(b'M', [0.475, 0.37, 0.125, 0.5095, 0.2165, 0.1125, 0.165, 9.0]),
(b'F', [0.55, 0.44, 0.15, 0.8945, 0.3145, 0.151, 0.32, 19.0])],
dtype=[('f0', 'S3'), ('f1', '<f8', (8,))])
In [23]: data['f1']
Out[23]:
array([[ 0.475 , 0.37 , 0.125 , 0.5095, 0.2165, 0.1125,
0.165 , 9. ],
[ 0.55 , 0.44 , 0.15 , 0.8945, 0.3145, 0.151 ,
0.32 , 19. ]])
The f1
field is a 2d array of shape (2,8).
np.loadtxt
will also work, but it's dtype
interpretation isn't as flexible. Copying the dtype
from the genfromtxt
example produces the same thing.
datal=np.loadtxt(txt,delimiter=',',dtype=data.dtype)
pandas
also has a good csv reader, with more speed and flexibility. It's a good choice if you are already working with pandas.