How do I exclude a single row from groupby().transform() in pandas?

Question

My intended goal was to perform a simple groupby, then store the group averages as a new column using .transform('mean'). Then things got complicated. The catch is that what I really want is an average of n-1 values, where 'n' is the number of rows belonging to each group. Example data, where the column RESULT is my desired output:

import pandas as pd

list_of_tuples = [('A', 3, 4.5),
                  ('A', 2, 4.75),
                  ('A', 5, 4),
                  ('A', 4, 4.25),
                  ('A', 7, 3.5),
                  ('B', 6, 6.75),
                  ('B', 9, 6),
                  ('B', 8, 6.25),
                  ('B', 4, 7.25),
                  ('B', 6, 6.75)]

df = pd.DataFrame.from_records(data=list_of_tuples, columns=['ID', 'VALUE', 'RESULT'])

>>> df
  ID  VALUE  RESULT
0  A      3    4.50
1  A      2    4.75
2  A      5    4.00
3  A      4    4.25
4  A      7    3.50
5  B      6    6.75
6  B      9    6.00
7  B      8    6.25
8  B      4    7.25
9  B      6    6.75

You can see that in the first row the value of RESULT is the average of [2, 5, 4, 7], which is 4.5. Likewise, the value of RESULT for the last row is the average of [6, 9, 8, 4], which is 6.75.

So for each row the value of RESULT should be the group average (grouping on ID) of VALUE excluding the number in VALUE for that particular row.

@IanThompson - No. The value of RESULT where index = 3 should exclude VALUE for that row (i.e., 4). The value of results where index = 7 should exclude 8. — Phil, Apr 08 '18 at 01:05

score 1 · Accepted Answer · answered Apr 08 '18 at 02:01

Got the answer from my comment above.

list_of_tuples = [('A', 3, 4.5),
                  ('A', 2, 4.75),
                  ('A', 5, 4),
                  ('A', 4, 4.25),
                  ('A', 7, 3.5),
                  ('B', 6, 6.75),
                  ('B', 9, 6),
                  ('B', 8, 6.25),
                  ('B', 4, 7.25),
                  ('B', 6, 6.75)]

df = pd.DataFrame(list_of_tuples)

df.drop(2, axis = 1, inplace = True)

n = df.groupby(0)[1].transform('count')
m = df.groupby(0)[1].transform('mean')
df['result'] = (m*n - df[1])/(n-1)

df

How do I exclude a single row from groupby().transform() in pandas?

1 Answers1