0

Let's say I have the following dataframe:

     Individual      stop        x           y           z         time
0    23              1           20          27          4         21
1    23              2           23          24          13        63
2    1756            2           5           41          73        12
3    1756            3           7           42          72        6
4    1756            4           4.5         39          72        45
5    1756            4           3           50          73        98
6    2153            2           121         12          6         33
7    2153            3           122.5       2           6         0
8    3276            1           54          33          -12       0
9    5609            1           -2          44          -32       56
10   5609            2           8           44          -32       23
11   5609            5           102         -23         16        76

I would like to calculate the average of the position x, y, z weighted by the time for each Individual. I would like to then put the results in a new dataframe like this:

     Individual          bar_x       bar_y       bar_z  
0    23                  22.5        24.75       10.75
2    1756                3.72        45.96       72.68 
6    2153                121         12          6  
9    5609                50.48       11.15       24.16

I have done this with a loop going through every Individual and calculate the weighted average. It works well, but the running time is VERY long when the dataframe gets bigger. I am pretty sure there is a much faster solution using pandas but I haven't find the way yet, any idea please?

Thanks in advance!

Aleksander
  • 45
  • 5

1 Answers1

0

Is this what you are looking for?

(df[['x','y','z']].mul(df.groupby('Individual')['time']
.transform(lambda x: x.div(x.sum())).fillna(0),axis=0)
.groupby(df['Individual'])
.sum()
.round(2))

Output:

                 x      y      z
Individual                      
23           22.25  24.75  10.75
1756          3.72  45.96  72.68
2153        121.00  12.00   6.00
3276          0.00   0.00   0.00
5609         50.48  11.15  -8.46
rhug123
  • 7,893
  • 1
  • 9
  • 24
  • This works well as long as there is no individual that have a sum of time = 0. If so, I get : `ZeroDivisionError: float division by zero` – Aleksander Sep 08 '22 at 16:13
  • The transform function is basically doing this for individual 3276, and for me it returns `NaN` instead of an error. `pd.Series([0]).div(0)` – rhug123 Sep 08 '22 at 16:19
  • Would it be possible to do this only where a condition like (`time.sum() != 0` for each individual) is met? – Aleksander Sep 08 '22 at 16:30