My data is in this format and at the moment I do not wish to change the source data file. I am having trouble using transform, apply
to calculate the percentage of the total. Referencing from both of these link1 and link2, I came to realize the first link may not work as intended for my problem.
df_list = [['USA MARKET', 'APPLE', 'QUARTER', '2020-06-01', 100], ['USA MARKET', 'APPLE', 'YEARLY', '2020-06-01', 1000],
['USA MARKET', 'PEAR', 'QUARTER', '2020-06-01', 200], ['USA MARKET', 'PEAR', 'YEARLY', '2020-06-01', 5000],
['USA MARKET', 'APPLE', 'QUARTER', '2020-03-01', 300], ['USA MARKET', 'APPLE', 'YEARLY', '2020-03-01', 2000],
['USA MARKET', 'PEAR', 'QUARTER', '2020-03-01', 700], ['USA MARKET', 'PEAR', 'YEARLY', '2020-03-01', 8000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 100], ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 1000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 200], ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 5000],
['UK MARKET', 'GRAPE', 'QUARTER', '2020-06-01', 200], ['UK MARKET', 'GRAPE', 'YEARLY', '2020-06-01', 5000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2020-03-01', 300], ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-03-01', 2000],
['UK MARKET', 'WATERMELON', 'QUARTER', '2020-03-01', 700], ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-03-01', 8000],
['UK MARKET', 'GRAPE', 'QUARTER', '2020-03-01', 200], ['UK MARKET', 'GRAPE', 'YEARLY', '2020-03-01', 5000]]
column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'DATE', 'VALUES']
df = pd.DataFrame(df_list, columns = column_names)
def market_share(x):
try:
ms = x / float(x.sum())
except ZeroDivisionError:
ms = 0
return ms
df['MS'] = (df.groupby(['MARKET', 'TIMEPERIOD', 'DATE'])['VALUES']
.transform(market_share))
Output from above is:
MARKET PRODUCT TIMEPERIOD DATE VALUES MS
0 USA MARKET APPLE QUARTER 2020-06-01 100 0.333333
1 USA MARKET APPLE YEARLY 2020-06-01 1000 0.166667
2 USA MARKET PEAR QUARTER 2020-06-01 200 0.666667
3 USA MARKET PEAR YEARLY 2020-06-01 5000 0.833333
4 USA MARKET APPLE QUARTER 2020-03-01 300 0.300000
5 USA MARKET APPLE YEARLY 2020-03-01 2000 0.200000
6 USA MARKET PEAR QUARTER 2020-03-01 700 0.700000
7 USA MARKET PEAR YEARLY 2020-03-01 8000 0.800000
8 UK MARKET WATERMELON QUARTER 2020-06-01 100 0.200000
9 UK MARKET WATERMELON YEARLY 2020-06-01 1000 0.090909
10 UK MARKET WATERMELON QUARTER 2020-06-01 200 0.400000
11 UK MARKET WATERMELON YEARLY 2020-06-01 5000 0.454545
12 UK MARKET GRAPE QUARTER 2020-06-01 200 0.400000
13 UK MARKET GRAPE YEARLY 2020-06-01 5000 0.454545
14 UK MARKET WATERMELON QUARTER 2020-03-01 300 0.250000
15 UK MARKET WATERMELON YEARLY 2020-03-01 2000 0.133333
16 UK MARKET WATERMELON QUARTER 2020-03-01 700 0.583333
17 UK MARKET WATERMELON YEARLY 2020-03-01 8000 0.533333
18 UK MARKET GRAPE QUARTER 2020-03-01 200 0.166667
19 UK MARKET GRAPE YEARLY 2020-03-01 5000 0.333333
Now, this is not wrong per se, for if you want to calculate the percentage of total for Watermelon in the UK Market in 2020-06-01 Quarter, you can just use 0.2+0.4=0.6, but I wish that I can have the Market Share (Percentage of Total) to be 0.6 directly, instead of just 0.2 and 0.4 respectively for me to add up.
UK MARKET WATERMELON QUARTER 2020-06-01 300 0.6
This problem can be resolved using the second link I posted, but I would want to know what I did wrong here to not get the aggregation correctly.