1

I would like to calculate the maximum value in a group, but not using the row's own value.

So if we have a data frame like this:

d = {'col1': ["a", "a", "b", "a", "b", "a"], 'col2': [0, 4, 3, -5, -1, 2]}
df = pd.DataFrame(data=d)

print(df)

  col1  col2
0    a     0
1    a     4
2    b     3
3    a    -5
4    b    -1
5    a     2

Then I would like to add a column max_other like this:

  col1  col2 max_other
0    a     0    4
1    a     4    2
2    b     3    -1
3    a    -5    4
4    b    -1    3
5    a     2    4

Source: This is a follow-up question from this question where I asked about calculating means in groups leaving out the row's own value.

Edit: My max_other had a mistake in row 1 (it said 3 when it should be 2).

ulima2_
  • 1,276
  • 1
  • 13
  • 23
  • How does `max_other` come out? – knh190 Apr 17 '19 at 09:11
  • In row 0, it leaves out 0 and compares 4, 3 and -5 and takes the highest value (4). In row 1, it leaves out 4, compares 0, -5 and 2 and keeps 2. In row 2, it leaves out 3 and uses -1. And so on ... – ulima2_ Apr 17 '19 at 09:20
  • 1
    Your updated edit is confusing. What's `max_own`? Did you mean `max_other`? And your comparison should be "for row 0 it leaves out 0 and compares 4, -5, 2" instead of "4,3,-5" because it's grouped by "col1"? – knh190 Apr 17 '19 at 09:24

1 Answers1

1

you can try with:

m=df.groupby('col1')['col2'].transform(lambda x: x.eq(x.max()))
d1=df[~m].groupby('col1')['col2'].max().to_dict()
d2=dict(zip(df.loc[m,'col1'],df.loc[m,'col2']))

df['max_other']=np.where(m,df.col1.map(d1),df.col1.map(d2))
print(df)

  col1  col2  max_other
0    a     0          4
1    a     4          2
2    b     3         -1
3    a    -5          4
4    b    -1          3
5    a     2          4

Details: We create a boolean mask to check where the row is equal to the max of the group:

m=df.groupby('col1')['col2'].transform(lambda x: x.eq(x.max()))
print(m)

0    False
1     True
2     True
3    False
4    False
5    False

We create 2 dicts:

print(d1)
{'a': 2, 'b': -1}

print(d2)
{'a': 4, 'b': 3}

Then we use, np.where() to see where the condition matches and where not, and map accordingly.

anky
  • 74,114
  • 11
  • 41
  • 70