1

I have a problem in python. This is my sample data

     col1  col2  desired
0    a     1     2.50
1    a     2     2.00
2    a     3     1.50
3    b     2     3.00
4    b     3     2.00
5    c     3     1.67
6    c     1     2.33
7    c     2     2.00
8    c     2     2.00

the input are df['col1'] and df['col2']. I want to use these two columns to produce the desired outcome in df['desired'].

The idea is, that I want to group by col1 and calculate the average value of col2. The only tweak here, though, is that I want to exclude the current row from the average value calculation.

So for row 0, I am grouping by df['col1'] == 'a', but only use row 1 and 2 to calculate the average. For row 1, I also group by df['col1'] == 'a', but I only use row 1 and 3. And so forth.

The only thing I can think of, is to create a custom function for .transform() that will take as input the series coming in from the grouped object, but I am not sure how to approach it. Ideally, I am looking for a simpler (pandas?) method to achieve this.

KenHBS
  • 6,756
  • 6
  • 37
  • 52
Varun Rajan
  • 276
  • 2
  • 11

2 Answers2

3

Solution working with definition of mean - sum/count.

So first get count by transform and subtract 1 for remove actual row, same with sum for remove actual row value. Last divide and assign to new column:

a = df.groupby('col1')['col2'].transform('size').sub(1)
b = df.groupby('col1')['col2'].transform('sum').sub(df['col2'])

df['des'] = b / a
print (df)
  col1  col2  desired       des
0    a     1     2.50  2.500000
1    a     2     2.00  2.000000
2    a     3     1.50  1.500000
3    b     2     3.00  3.000000
4    b     3     2.00  2.000000
5    c     3     1.67  1.666667
6    c     1     2.33  2.333333
7    c     2     2.00  2.000000
8    c     2     2.00  2.000000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Another option is filtering the selected row:

df['desired'] = df.apply(lambda x: df[~df.index.isin([x.name])].groupby('col1')['col2'].mean().loc[x['col1']], axis=1)

output:

>>0    2.5
1    2.0
2    1.5
3    5.5
4    5.0
5    4.5
Tarifazo
  • 4,118
  • 1
  • 9
  • 22