-1

I have following data in my CSV:

"sepal_length,sepal_width,petal_length,petal_width,species"
"5.1,3.5,1.4,0.2,setosa"
"4.9,3,1.4,0.2,setosa"
"4.7,3.2,1.3,0.2,setosa"

Loading the file to convert to a numpy object:

import numpy as np
loaded_csv = np.genfromtxt('iris.csv', delimiter=',')

The output:

[[nan nan nan nan nan]
[nan 3.5 1.4 0.2 nan]
[nan 3.  1.4 0.2 nan]
[nan 3.2 1.3 0.2 nan]
[nan 3.1 1.5 0.2 nan]]

How to keep strings text and make the first element be considered a float?

Costa.Gustavo
  • 849
  • 10
  • 21
  • 1
    Is [this](https://stackoverflow.com/questions/17933282/using-numpy-genfromtxt-to-read-a-csv-file-with-strings-containing-commas/37060475) of any help to you? – Sheldore Apr 29 '19 at 16:28
  • 1
    In your np.genfromtxt parameters, add header=0 – Parkofadown Apr 29 '19 at 17:26
  • The default `dtype` is `float`. The `nan` replace strings that could not be converted to numbers. – hpaulj Apr 29 '19 at 17:36

1 Answers1

1

With correct dtype and names (header use) you can get a structured array:

In [148]: alist=["sepal_length,sepal_width,petal_length,petal_width,species", 
     ...: "5.1,3.5,1.4,0.2,setosa", 
     ...: "4.9,3,1.4,0.2,setosa", 
     ...: "4.7,3.2,1.3,0.2,setosa"]                                                  

In [150]: data = np.genfromtxt(alist, delimiter=',', dtype=None, names=True, encoding=None)                                                                     
In [151]: data                                                                       
Out[151]: 
array([(5.1, 3.5, 1.4, 0.2, 'setosa'), (4.9, 3. , 1.4, 0.2, 'setosa'),
       (4.7, 3.2, 1.3, 0.2, 'setosa')],
      dtype=[('sepal_length', '<f8'), ('sepal_width', '<f8'), ('petal_length', '<f8'), ('petal_width', '<f8'), ('species', '<U6')])

This is a 1d array with named fields:

In [152]: data['sepal_length']                                                       
Out[152]: array([5.1, 4.9, 4.7])
In [153]: data['species']                                                            
Out[153]: array(['setosa', 'setosa', 'setosa'], dtype='<U6')
hpaulj
  • 221,503
  • 14
  • 230
  • 353