1

I have a multiple column dataframe and want to separate data in a particular column by grouping them based on another column.

Here is an example:

ID     Name   Score
1      John   100
2      Lisa   80
3      David  75 
4      Lisa   92
5      John   89
6      Lisa   72

I would like my output to be like:

index  John  Lisa  David
0      100   80    75
1      89    92    NaN
2      NaN   72    NaN

I understand I can easily use the code:

df[df['Name'] == 'John]] and separate all the scores that John has in the dataframe, but since my dataframe is very large I would like to have a clean output like what I showed here.

Habib
  • 11
  • 2

1 Answers1

1

You can use pivot_table. You can generate the index needed to pivot grouping the dataframe by Name and taking the cumcount:

ix = df.groupby('Name').cumcount()
pd.pivot_table(df,'Score', ix, 'Name')

Name  David   John  Lisa
1      75.0  100.0  80.0
2       NaN   89.0  92.0
3       NaN    NaN  72.0
yatu
  • 86,083
  • 12
  • 84
  • 139
  • Surprisingly for smaller DataFrames (a few thousand rows) `df.groupby('Name').Score.apply(list).apply(pd.Series).T` is faster. – ALollz Jan 08 '19 at 19:27
  • Wouldn't have said that piece of code works faster. Interesting to know :-) – yatu Jan 08 '19 at 19:29