0

I have a dataframe that looks like this:

df = pd.DataFrame( [
  ['BILING',2017,7,1406 ],
  ['BILWPL',2017,7,199],
  ['BKCLUB',2017,7,9417],
  ['LEAVEN',2017,7,4773 ],
  ['MAILORDER',2017,7,10487]
], columns=['Branch','Year','Month','count']

df
Out[1]:
            Branch  Year      Month  count
    0       BILING  2017      7   1406 
    1       BILWPL  2017      7   199
    2       BKCLUB  2017      7   9417
    10      LEAVEN  2017      7   4773 
    18   MAILORDER  2017      7  10487

It contains the same month but different years so that one can compare the time of year across time.

The desired output would look something like:

    Branch     Month  2017   2019  Mean(ave) percent_diff  
    BILING     7      1406   1501  1480      5%
    BILWPL     7      199    87    102       -40%
    BKCLUB     7      9417   8002  7503      -3%
    LEAVEN     7      4773   5009  4509      -15%
    MAILORDER  7      10487  11032 9004      8%

My question is how to aggregate based on branch to display across and add 2 columns: mean and percent difference between mean and newest year.

**** UPDATE **** This is close but is missing some columns [ Thanks G. Anderson ]:

df.pivot_table(
    values='count', index='Branch', columns='Year',
    fill_value=0, aggfunc='mean')

Produces:

Year    2017    2018    2019
Branch          
BILING  1406    1280    4
BILWPL  199     117     239
BKCLUB  94      161     238

This is very close but I'm hoping to tack on columns corresponding to the mean, and percent difference.

* UPDATE 2 *

circ_pivot = df.pivot_table(
    values='count', index='Branch', columns='Year',
    fill_value=0)

circ_pivot['Mean'] = circ_pivot[[2017,2018,2019]].mean(axis=1)
circ_pivot['Change'] = ((circ_pivot[2019] - circ_pivot[2018]) / circ_pivot[2018]) * 100
circ_pivot['Change_mean'] = ((circ_pivot[2019] - circ_pivot['Mean']) / circ_pivot['Mean']) * 100

Output:

Year         2017   2018   2019          Mean      Change  Change_mean
Branch                                                                
BILING       1406   1280      4    896.666667  -99.687500   -99.553903
BILWPL        199    117    239    185.000000  104.273504    29.189189
BKCLUB         94    161    238    164.333333   47.826087    44.827586
Bubnoff
  • 3,917
  • 3
  • 30
  • 33
  • 1
    Possible duplicate of [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) – G. Anderson Aug 06 '19 at 22:22
  • Percent difference average between all years, or between each subsequent year? – user3483203 Aug 06 '19 at 22:23
  • percent difference between the newest year and the mean of all years. Also - the previous year. – Bubnoff Aug 06 '19 at 22:32
  • @G.Anderson -- The display is right but I haven't run into an example that adds a column ...mean, percent difference. Thanks! – Bubnoff Aug 06 '19 at 22:39
  • 1
    You can save the pivot as a dataframe and add columns to it normally, like in [this answer](https://stackoverflow.com/questions/18504967/pandas-dataframe-create-new-columns-and-fill-with-calculated-values-from-same-df/18505101) – G. Anderson Aug 07 '19 at 14:52
  • @G.Anderson -- Thanks! I did that and updated the question and added my solution below. – Bubnoff Aug 07 '19 at 23:24

1 Answers1

0

This is the solution I ended up with.

circ_pivot = df.pivot_table(
    values='count', index='Branch', columns='Year',
    fill_value=0, aggfunc=np.sum, margins=True)

circ_pivot['Mean'] = round(circ_pivot[[2017,2018,2019]].mean(axis=1))
circ_pivot['Change'] = round(((circ_pivot[2019] - circ_pivot[2018]) / circ_pivot[2018]) * 100)
circ_pivot['Change_mean'] = round(((circ_pivot[2019] - circ_pivot['Mean']) / circ_pivot['Mean']) * 100)

print(circ_pivot)

Output:

Year          2017    2018    2019       All      Mean  Change  Change_mean
Branch                                                                     
BILING        1406    1280       4    2690.0     897.0  -100.0       -100.0
BILWPL         199     117     239     555.0     185.0   104.0         29.0
BKCLUB          94     161     238     493.0     164.0    48.0         45.0

Improvements would be:

  • Relative dates instead of hard coded date fields.
Bubnoff
  • 3,917
  • 3
  • 30
  • 33