30

I'm having difficulty using transpose with pandas.

I have the following df:

date         name    quantity
1/1/2018     A       5
1/1/2018     B       6
1/1/2018     C       7
1/2/2018     A       9
1/2/2018     B       8
1/2/2018     C       6

I eventually want to create a pairwise correlation for all the names and their quantities on each date. To to that end, I'm trying to create the following output from this df first:

 date       A    B    C
 1/1/2018   5    6    7
 1/2/2018   9    8    6

The transpose is difficult to me since I can get duplicate column headers, but I also don't want to lose any data by dropping them first. I have a feeling the answer may be with a panda utility that I don't really use and I may be tunneling on transpose...

JesusMonroe
  • 1,421
  • 3
  • 13
  • 20

3 Answers3

45

Since you aren't performing an aggregation, pd.DataFrame.pivot should be preferred to groupby / pivot_table:

res = df.pivot(index='date', columns='name', values='quantity')

print(res)

name      A  B  C
date             
1/1/2018  5  6  7
1/2/2018  9  8  6

If you wish you can use reset_index to elevate date to a column.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • dude you rock! Thank you for the reset_index tip. It would be super helpful to show a example for those in the future that come by this example. – boardkeystown Sep 25 '22 at 07:34
3

By no means is my proposed solution better than jpp's. I just happened to run into the same problem and solved it differently.

df.set_index(['date', 'name']).unstack()

The result looks a little messier too but it worked in my case:

enter image description here

Bowen Liu
  • 1,065
  • 1
  • 11
  • 24
0

Here's a groupby solution, although it's highly impractical when compared to the pivot method. I would only recommend this as an exercise to become familiar with pandas' indices.

# Get values of 'quantity' for each date
x = df.groupby('date')['quantity'].agg(list)
# Insert these values into a new data frame
df2 = pd.DataFrame(index=x.index, data=x.to_list(), columns=df['name'].unique())

This returns:

            A   B   C
date            
1/1/2018    5   6   7
1/2/2018    8   9   6
Arturo Sbr
  • 5,567
  • 4
  • 38
  • 76