1

I have a dataframe as below:

In [14]: grouped_data
Out[14]: 
  monthyear Facility       Date       Yield 
0    Dec 15     CCM1 2015-12-01  2550.000000
1    Feb 16     CCM1 2016-02-01  4250.000000
2    Jan 16     CCM2 2016-01-01  1540.000000
3    Jan 16     CCM3 2016-01-01  6800.000000
4    Nov 15     CCM1 2015-11-01   921.458157
5    Nov 15     CCM2 2015-11-01  1750.310038
6    Sep 15     CCM3 2015-09-01  5191.197065

Now I need the dataframe to look like this:

  monthyear    CCM1      CCM2       CCM3         Date       
0  Dec 15  2550.000000    0          0        2015-12-01  
1  Feb 16  4250.000000    0          0        2016-02-01  
2  Jan 16     0       1540.000000 6800.000000 2016-01-01  
3  Nov 15  921.458157 1750.310038    0        2015-11-01  
4  Sep 15     0       5191.197065    0        2015-09-01  

How will i do this with Pandas. Please help. Thanks in advance.

Gil Baggio
  • 13,019
  • 3
  • 48
  • 37

1 Answers1

1

Use pivot_table:

print (df.pivot_table(index=['monthyear','Date'], 
                      columns='Facility', 
                      values='Yield',
                      fill_value=0))

Facility                     CCM1         CCM2         CCM3
monthyear Date                                             
Dec 15    2015-12-01  2550.000000     0.000000     0.000000
Feb 16    2016-02-01  4250.000000     0.000000     0.000000
Jan 16    2016-01-01     0.000000  1540.000000  6800.000000
Nov 15    2015-11-01   921.458157  1750.310038     0.000000
Sep 15    2015-09-01     0.000000     0.000000  5191.197065

If you want reset_index and remove columns name use rename_axis (new in pandas 0.18.0):

print (df.pivot_table(index=['monthyear','Date'], 
                      columns='Facility', 
                      values='Yield',
                      fill_value=0).reset_index().rename_axis(None, axis=1))

  monthyear        Date         CCM1         CCM2         CCM3
0    Dec 15  2015-12-01  2550.000000     0.000000     0.000000
1    Feb 16  2016-02-01  4250.000000     0.000000     0.000000
2    Jan 16  2016-01-01     0.000000  1540.000000  6800.000000
3    Nov 15  2015-11-01   921.458157  1750.310038     0.000000
4    Sep 15  2015-09-01     0.000000     0.000000  5191.197065

But pivot_table uses aggfunc, default is aggfunc=np.mean if duplicates in columns montyear with Date. Better explanation with sample is here and in docs.

Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252