I have te following pandas dataframe:
data_df = pd.DataFrame({'ind':['la','p','la','la','p','g','g','la'],
'dist':[10.,5.,7.,8.,7.,2.,5.,3.],
'diff':[0.54,3.2,8.6,7.2,2.1,1.,3.5,4.5],
'cas':[1.,2.,3.,4.,5.,6.,7.,8.]})
that is
cas diff dist ind
0 1 0.54 10 la
1 2 3.20 5 p
2 3 8.60 7 la
3 4 7.20 8 la
4 5 2.10 7 p
5 6 1.00 2 g
6 7 3.50 5 g
7 8 4.50 3 la
I need to compute the weighted average of all the columns where the weights are in the 'dist' column and group the values by 'ind'.
For example for 'ind'='la' and the 'diff' column:
((10*0.54)+(8.60*7)+(7.20*8)+(4.50*3))/(10+7+8+3) = 4.882143
The result I want to obtain is the following
cas diff
ind
g 6.714286 2.785714
la 3.107143 4.882143
p 3.750000 2.558333
which is obtained by multiplying each value of each colums by the corrisponding value in the 'dist' column, sum the results with the same 'ind' and then dividing the result by the sum of all the 'dist' values corrisponding to the same ind.
I thought this would have been an easy task done by the dataframe 'groupby' method, but actually it's kind of tricky.
Can someone please help me?