3

I am working with numpy and pandas on Python to learn how to work on dataframes.

I'm coding on Collaboratory and I have loaded the Iris dataset but for some reason, there is no "Species" column in my dataframe. Maybe I've loaded it in an incorrect fashion? I'd appreciate help on the matter.

I added an image, if the code is still needed then this is what I have:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

df = pd.DataFrame(load_iris().data, columns=load_iris().feature_names)

enter image description here

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Chefi
  • 41
  • 4
  • See more information abouth the dataset here: https://scikit-learn.org/stable/datasets/toy_dataset.html#iris-dataset. It seems that the species is the y column in the dataset which can be called by: load_iris().target – Brian Barbieri Nov 03 '21 at 08:45

1 Answers1

4

Try:

import numpy as np
import pandas as pd 
from sklearn.datasets import load_iris

iris = load_iris()

df = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
                  columns= iris['feature_names'] + ['target']).astype({'target': int}) \
       .assign(species=lambda x: x['target'].map(dict(enumerate(iris['target_names']))))

Output:

>>> df
     sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  target    species
0                  5.1               3.5                1.4               0.2       0     setosa
1                  4.9               3.0                1.4               0.2       0     setosa
2                  4.7               3.2                1.3               0.2       0     setosa
3                  4.6               3.1                1.5               0.2       0     setosa
4                  5.0               3.6                1.4               0.2       0     setosa
..                 ...               ...                ...               ...     ...        ...
145                6.7               3.0                5.2               2.3       2  virginica
146                6.3               2.5                5.0               1.9       2  virginica
147                6.5               3.0                5.2               2.0       2  virginica
148                6.2               3.4                5.4               2.3       2  virginica
149                5.9               3.0                5.1               1.8       2  virginica

[150 rows x 6 columns]

How to create the species column from target and target_names columns?

>>> iris['target_names']
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')
# index 0: setosa
# index 1: versicolor
# index 2: virginica

>>> iris['target']
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

You just need a dict mapping to replace 0 by 'setosa', 1 by 'versicolor' and 2 by 'virginica'. Use enumerate to create a list of tuples [(0, 'setosa'), (1, 'versicolor), (2, 'virginica')] then dict` to convert as a dictionary:

>>> dict(enumerate(iris['target_names']))
{0: 'setosa', 1: 'versicolor', 2: 'virginica'}

Now Series.map will map the corresponding values:

>>> df['target'].map(dict(enumerate(iris['target_names'])))
0         setosa
1         setosa
2         setosa
3         setosa
4         setosa
         ...    
145    virginica
146    virginica
147    virginica
148    virginica
149    virginica
Name: target, Length: 150, dtype: object
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Awesome! Could you maybe explain the parameters you added, please? If it's too much of a bother then I'll try to find out myself. Thank you so much! – Chefi Nov 03 '21 at 11:24
  • @Chefi. I updated my answer. Is it clearer now? – Corralien Nov 03 '21 at 15:17
  • Absolutely! Thank you again and sorry for the late response. (I don't visit stackoverflow very often) – Chefi Nov 09 '21 at 18:25