Pandas: transform column's values in independent columns

Question

I have Pandas DataFrame which looks like following (df_olymic). I would like the values of column Type to be transformed in independent columns (df_olympic_table)

Original dataframe

In [3]: df_olympic
Out[3]: 
   Country    Type Num
0      USA    Gold  46
1      USA  Silver  37
2      USA  Bronze  38
3       GB    Gold  27
4       GB  Silver  23
5       GB  Bronze  17
6    China    Gold  26
7    China  Silver  18
8    China  Bronze  26
9   Russia    Gold  19
10  Russia  Silver  18
11  Russia  Bronze  19

Transformed dataframe

In [5]: df_olympic_table
Out[5]: 
  Country N_Gold N_Silver N_Bronze
0     USA     46       37       38
1      GB     27       23       17
2   China     26       18       26
3  Russia     19       18       19

What would be the most convenient way to achieve this?

Possible duplicate of [Python Pandas: Convert Rows as Column headers](http://stackoverflow.com/questions/17298313/python-pandas-convert-rows-as-column-headers) — Aprillion, Jan 08 '17 at 10:34
Sorry, I might abuse pandas common terminology. Still keep learning :-) — TruLa, Jan 08 '17 at 10:47
@Aprillion - `pivot_table` is not necessary, if not duplicates better is `pivot`. — jezrael, Jan 08 '17 at 10:47

score 12 · Accepted Answer · edited Sep 27 '17 at 16:54

You can use DataFrame.pivot:

df = df.pivot(index='Country', columns='Type', values='Num')
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       19    19      18
USA          38    46      37

Another solution with DataFrame.set_index and Series.unstack:

df = df.set_index(['Country','Type'])['Num'].unstack()
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       19    19      18
USA          38    46      37

but if get:

ValueError: Index contains duplicate entries, cannot reshape

need pivot_table with some aggreagte function, by default it is np.mean, but you can use sum, first...

#add new row with duplicates value in 'Country' and 'Type'
print (df)
   Country    Type  Num
0      USA    Gold   46
1      USA  Silver   37
2      USA  Bronze   38
3       GB    Gold   27
4       GB  Silver   23
5       GB  Bronze   17
6    China    Gold   26
7    China  Silver   18
8    China  Bronze   26
9   Russia    Gold   19
10  Russia  Silver   18
11  Russia  Bronze   20 < - changed value to 20
11  Russia  Bronze  100 < - add new row with duplicates


df = df.pivot_table(index='Country', columns='Type', values='Num', aggfunc=np.mean)
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       60    19      18 < - Russia get ((100 + 20)/ 2 = 60
USA          38    46      37

Or groupby with aggreagting mean and reshape by unstack:

df = df.groupby(['Country','Type'])['Num'].mean().unstack()
print (df)
Type     Bronze  Gold  Silver
Country                      
China        26    26      18
GB           17    27      23
Russia       60    19      18 < - Russia get ((100 + 20)/ 2 = 60
USA          38    46      37

Perfect! You really made me learn something new. Thanks a lot. — TruLa, Jan 08 '17 at 10:44
Yes, it is more complicated if duplicates. Thanks for accepting! — jezrael, Jan 08 '17 at 10:44
@ade1e - yes, I changed data for add duplicates, so I add notice to changed rows. — jezrael, Jan 08 '17 at 10:51

Pandas: transform column's values in independent columns

1 Answers1