I'm unable to comment as I'm new to stackoverflow so can't ask directly in the thread, but I wanted to clarify the solution in this question:
# From Paul H import numpy as np import pandas as pd np.random.seed(0) df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3, 'office_id': list(range(1, 7)) * 2, 'sales': [np.random.randint(100000, 999999) for _ in range(12)]}) state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'}) # Change: groupby state_office and divide by sum state_pcts = state_office.groupby(level=0).apply(lambda x: 100 * x / float(x.sum()))
I understand multi-index selection (level 0 v. level 1), but I'm not clear on what each x
in the lambda functions refers to. The x
in x.sum()
would to me refer to level = 0
(summing all results within each grouping at level = 0
) but the x
in the 100 * x
appears to refer to each individual result within the groupby object (not the index level = 0
grouping).
Sorry for such a basic question but an explanation would be very useful!