1

I'm trying to do some Maths, using values from the results of two different groupbys. I have one series which includes ID, Project and Hours (Hours). The other series contains ID, Project and Cost (Costs).

What I am trying to do:

For each ID and Project pair in Hours, divide the Hours value by the total Hours for that ID. Then multiply that value by that IDs Cost in the Costs table.

For example, the forumula for the first ID-Project pair in the tables below would be:

39 / (39 + 220) * 35.286 = 5.31

Then:

220 / (39 + 220) * 35.286 = 29.97

Then:

187 / 187 * 91.24 = 91.24

And so on.

Hours ~ Hours by Project & ID (result of groupby with ID and Project being indexes):

                        Hours
ID      Project 
100001  413040-00140    39.0
        415012-00006    220.0
100002  705012-99999    187.5
100003  405012-15342    2.0
        418005-00023    144.0

Costs ~ Cost by ID (result of groupby with ID being index):

        Cost
ID  
100001  35.285835
100002  91.241904
100003  617.051535

What I want:

ID      Project         Allocated Costs
100001  413040-00140    5.31
        415012-00006    29.97
100002  705012-99999    91.24
100003  405012-15342    8.45
        418005-00023    608.60

If I filter these groupbys down to one ID, I can get it working with:

Hours / Hours.sum() * float(Costs.values[0][0])

But I need to do this for all IDs

  • that's just a simple merge between two dataframes. – Steven Aug 23 '21 at 11:43
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Steven Aug 23 '21 at 11:43
  • @Steven Thanks. I will take a look. Although looks like the below answer answers my question. –  Aug 23 '21 at 11:53

1 Answers1

0

Use Series.div with sum per first level of MultiIndex and last multiple by Series.mul:

df1['Allocated Costs']=df1['Hours'].div(df1['Hours'].sum(level=0), level=0).mul(df2['Cost'])
print (df1)
                     Hours  Allocated Costs
ID     Project                             
100001 413040-00140   39.0         5.313311
       415012-00006  220.0        29.972524
100002 705012-99999  187.5        91.241904
100003 405012-15342    2.0         8.452761
       418005-00023  144.0       608.598774

EDIT:

For last pandas version use:

df1.groupby(level=0)['Hours'].sum()

instead:

df1['Hours'].sum(level=0)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I think that works! Thank you so much. Does being able to come up with something like that so quickly just come with experience? –  Aug 23 '21 at 11:52