2

While I think I could do this naively and poorly, I'm interested to learn a more elegant and efficient approach.

Given the following dataframe:

In [42]: df = pd.DataFrame({'flavor':['goober','tronic','goober','tronic'], 'points':[42,55,31,101]}, index=['foo','foo','bar','bar'])

In [43]: df
Out[43]: 
     flavor  points
foo  goober      42
foo  tronic      55
bar  goober      31
bar  tronic     101

I would like to groupby the index, and convert values from flavor column into column headers themselves, completely throwing away the flavor and points. So the final result would look like:

In [44]: pd.DataFrame({'goober':[42,31], 'tronic':[55,101]}, index=['foo','bar'])
Out[44]: 
     goober  tronic
foo      42      55
bar      31     101

Thanks for any suggestions.

ghukill
  • 1,136
  • 17
  • 42

2 Answers2

2

Use DataFrame.pivot with convert index to column first and then remove index and columns names by DataFrame.rename_axis:

df = df.reset_index().pivot('index', 'flavor','points').rename_axis(index=None,columns=None)
print (df)
     goober  tronic
bar      31     101
foo      42      55

Or use DataFrame.set_index with Series.unstack:

df = (df.set_index('flavor', append=True)['points']
        .unstack()
        .rename_axis(index=None, columns=None))
print (df)
     goober  tronic
bar      31     101
foo      42      55
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This is obviously a toy example, at scale, would these differ in performance and/or memory usage? Appreciate the answer. – ghukill Oct 21 '20 at 08:44
  • 1
    @ghukill - hmmm, generally pivoting are really complicated operation, so not easy says what solution is better. The best test with data. – jezrael Oct 21 '20 at 08:45
  • Any advice on when `.unstack()` is particularly helpful? maybe an SO question/answer? No worries if not, but I'd like to build some intuition around `pivot` and `unstack`, and where they shine. – ghukill Oct 21 '20 at 08:58
  • 1
    @ghukill - Sure, it is [here](https://stackoverflow.com/q/47152691/) – jezrael Oct 21 '20 at 09:00
  • 1
    Thanks @jezrael! All much appreciated. – ghukill Oct 21 '20 at 12:04
1

You can use pandas.pivot_table:

print( pd.pivot_table(df, index=df.index, columns='flavor', values='points') )

Prints:

flavor  goober  tronic
bar         31     101
foo         42      55

Note: If you don't need aggfunc, use .pivot() instead. As @jezrael pointed out in the comments.

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91