0

I have a dataframe "iris" which have three different species, how to create a dataframe with all different specieas as column names and sepal length as the values?

Setup:

import numpy as np
import pandas as pd

df = sns.load_dataset('iris')

print(df.head())
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

I can do manually

This is a long and difficult way, I am looking for some groupby method that does this in a single operation possibly using chain.

seto = df.loc[df['species']=='setosa','sepal_length']
ver = df.loc[df['species']=='versicolor','sepal_length']
vir = df.loc[df['species']=='virginica','sepal_length']

ans = pd.DataFrame({'setosa': seto.values,'versicolor':ver.values,'virginica':vir.values})

print(ans.head())

Required output

df.groupby('species')['spepal_length'].SOMETHING.SOMETHING
   setosa  versicolor  virginica
0     5.1         7.0        6.3
1     4.9         6.4        5.8
2     4.7         6.9        7.1
3     4.6         5.5        6.3
4     5.0         6.5        6.5
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169

1 Answers1

1

This is pivot_table:

df.pivot_table(index=df.groupby('species').cumcount(),
               columns='species',
               values='sepal_length',
               aggfunc='first'
              )

Output (head):

species  setosa  versicolor  virginica
0           5.1         7.0        6.3
1           4.9         6.4        5.8
2           4.7         6.9        7.1
3           4.6         5.5        6.3
4           5.0         6.5        6.5
5           5.4         5.7        7.6
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74