1

My intended goal was to perform a simple groupby, then store the group averages as a new column using .transform('mean'). Then things got complicated. The catch is that what I really want is an average of n-1 values, where 'n' is the number of rows belonging to each group. Example data, where the column RESULT is my desired output:

import pandas as pd

list_of_tuples = [('A', 3, 4.5),
                  ('A', 2, 4.75),
                  ('A', 5, 4),
                  ('A', 4, 4.25),
                  ('A', 7, 3.5),
                  ('B', 6, 6.75),
                  ('B', 9, 6),
                  ('B', 8, 6.25),
                  ('B', 4, 7.25),
                  ('B', 6, 6.75)]

df = pd.DataFrame.from_records(data=list_of_tuples, columns=['ID', 'VALUE', 'RESULT'])

>>> df
  ID  VALUE  RESULT
0  A      3    4.50
1  A      2    4.75
2  A      5    4.00
3  A      4    4.25
4  A      7    3.50
5  B      6    6.75
6  B      9    6.00
7  B      8    6.25
8  B      4    7.25
9  B      6    6.75

You can see that in the first row the value of RESULT is the average of [2, 5, 4, 7], which is 4.5. Likewise, the value of RESULT for the last row is the average of [6, 9, 8, 4], which is 6.75.

So for each row the value of RESULT should be the group average (grouping on ID) of VALUE excluding the number in VALUE for that particular row.

Phil
  • 325
  • 1
  • 3
  • 6
  • @IanThompson - No. The value of RESULT where index = 3 should exclude VALUE for that row (i.e., 4). The value of results where index = 7 should exclude 8. – Phil Apr 08 '18 at 01:05

1 Answers1

1

Got the answer from my comment above.

list_of_tuples = [('A', 3, 4.5),
                  ('A', 2, 4.75),
                  ('A', 5, 4),
                  ('A', 4, 4.25),
                  ('A', 7, 3.5),
                  ('B', 6, 6.75),
                  ('B', 9, 6),
                  ('B', 8, 6.25),
                  ('B', 4, 7.25),
                  ('B', 6, 6.75)]

df = pd.DataFrame(list_of_tuples)

df.drop(2, axis = 1, inplace = True)

n = df.groupby(0)[1].transform('count')
m = df.groupby(0)[1].transform('mean')
df['result'] = (m*n - df[1])/(n-1)

df

final

Ian Thompson
  • 2,914
  • 2
  • 18
  • 31