1

My data is in this format and at the moment I do not wish to change the source data file. I am having trouble using transform, apply to calculate the percentage of the total. Referencing from both of these link1 and link2, I came to realize the first link may not work as intended for my problem.

df_list = [['USA MARKET', 'APPLE', 'QUARTER', '2020-06-01', 100], ['USA MARKET', 'APPLE', 'YEARLY', '2020-06-01', 1000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2020-06-01', 200],  ['USA MARKET', 'PEAR', 'YEARLY', '2020-06-01', 5000], 
           ['USA MARKET', 'APPLE', 'QUARTER', '2020-03-01', 300],  ['USA MARKET', 'APPLE', 'YEARLY', '2020-03-01', 2000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2020-03-01', 700],  ['USA MARKET', 'PEAR', 'YEARLY', '2020-03-01', 8000], 
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 100], ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 1000],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 200],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 5000], 
           ['UK MARKET', 'GRAPE', 'QUARTER', '2020-06-01', 200],    ['UK MARKET', 'GRAPE', 'YEARLY', '2020-06-01', 5000],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2020-03-01', 300],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-03-01', 2000],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2020-03-01', 700],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-03-01', 8000],
           ['UK MARKET', 'GRAPE', 'QUARTER', '2020-03-01', 200],  ['UK MARKET', 'GRAPE', 'YEARLY', '2020-03-01', 5000]]

column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'DATE', 'VALUES']
df = pd.DataFrame(df_list, columns = column_names)

def market_share(x):
    try:
        ms = x / float(x.sum())
    except ZeroDivisionError:
        ms = 0
    
    return ms
    

df['MS'] = (df.groupby(['MARKET', 'TIMEPERIOD', 'DATE'])['VALUES']
                                        .transform(market_share))

Output from above is:

    MARKET     PRODUCT       TIMEPERIOD    DATE         VALUES   MS
0   USA MARKET  APPLE        QUARTER       2020-06-01   100     0.333333
1   USA MARKET  APPLE        YEARLY        2020-06-01   1000    0.166667
2   USA MARKET  PEAR         QUARTER       2020-06-01   200     0.666667
3   USA MARKET  PEAR         YEARLY        2020-06-01   5000    0.833333
4   USA MARKET  APPLE        QUARTER       2020-03-01   300     0.300000
5   USA MARKET  APPLE        YEARLY        2020-03-01   2000    0.200000
6   USA MARKET  PEAR         QUARTER       2020-03-01   700     0.700000
7   USA MARKET  PEAR         YEARLY        2020-03-01   8000    0.800000
8   UK MARKET   WATERMELON   QUARTER       2020-06-01   100     0.200000
9   UK MARKET   WATERMELON   YEARLY        2020-06-01   1000    0.090909
10  UK MARKET   WATERMELON   QUARTER       2020-06-01   200     0.400000
11  UK MARKET   WATERMELON   YEARLY        2020-06-01   5000    0.454545
12  UK MARKET   GRAPE        QUARTER       2020-06-01   200     0.400000
13  UK MARKET   GRAPE        YEARLY        2020-06-01   5000    0.454545
14  UK MARKET   WATERMELON   QUARTER       2020-03-01   300     0.250000
15  UK MARKET   WATERMELON   YEARLY        2020-03-01   2000    0.133333
16  UK MARKET   WATERMELON   QUARTER       2020-03-01   700     0.583333
17  UK MARKET   WATERMELON   YEARLY        2020-03-01   8000    0.533333
18  UK MARKET   GRAPE        QUARTER       2020-03-01   200     0.166667
19  UK MARKET   GRAPE        YEARLY        2020-03-01   5000    0.333333

Now, this is not wrong per se, for if you want to calculate the percentage of total for Watermelon in the UK Market in 2020-06-01 Quarter, you can just use 0.2+0.4=0.6, but I wish that I can have the Market Share (Percentage of Total) to be 0.6 directly, instead of just 0.2 and 0.4 respectively for me to add up.

UK MARKET   WATERMELON   QUARTER       2020-06-01   300  0.6

This problem can be resolved using the second link I posted, but I would want to know what I did wrong here to not get the aggregation correctly.

ilovewt
  • 911
  • 2
  • 10
  • 18
  • 1
    have you debugged this? – ansev Nov 12 '20 at 14:57
  • No, I can’t debug this using the code I pasted. However it can be done using the way in the second link. But I am more inclined to know how to resolve it using the method I shown above. – ilovewt Nov 12 '20 at 15:02

1 Answers1

0

The data looks right to me.

You are grouping by on 3 indices: ['MARKET', 'TIMEPERIOD', 'DATE'].

Remember that you are not grouping by on PRODUCT

UK MARKET   WATERMELON   QUARTER       2020-06-01   300  0.6

The corresponding grouped enteries should be:

8   UK MARKET   WATERMELON   QUARTER       2020-06-01   100     0.200000
10  UK MARKET   WATERMELON   QUARTER       2020-06-01   200     0.400000
12  UK MARKET   GRAPE        QUARTER       2020-06-01   200     0.400000

So the values look right to me.

Serial Lazer
  • 1,667
  • 1
  • 7
  • 15
  • Yes the values are indeed right. But the thing is that watermelon appeared twice in UK market. I want them to appear as one in the end result as just 0.6 of the market share. The result I’m showing separately shows the market share of the watermelon - when I want them to be shown as one entity. – ilovewt Nov 12 '20 at 15:48
  • For aggregated result for Watermelon, you can also groupby on `PRODUCT` too. What am I missing? – Serial Lazer Nov 12 '20 at 17:54
  • If you do that, you will ignore the GRAPE in the UK Market, and now Watermelon will have 0.33 and 0.66 respectively, which is wrong – ilovewt Nov 13 '20 at 01:48