How to get the mean value of a columns having condition which specified?

Question

I have a data frame having columns ( Name, a, b) and I want to create a columns name"mean" which would be mean of column a and b, but if mean of any two-row is same then whosever sum value is more should decrease by 0.1.

   data frame 1

  Name  Sum  a   b      mean
0 hamm   34  2   2       2
1 jam    54  1   1  -->  1
2 tan    36  3   1       2
3 pan    39  4   4       4

as we see now row 0 and 2 have the same mean value, so now whosever sum value is more should decrease by 0.1

Here, in this case, its row no 2 should have value 2- 0.1 = 1.9

Final Result

  Name  Sum  a   b   mean
0 hamm   34  2   2    2
1 jam    54  1   1    1
2 tan    36  3   1    1.9
3 pan    39  4   4    4

What problem are you trying to solve by doing this? I can't think of a reason why it would make any mathematical sense. — Karl Knechtel, Apr 30 '20 at 07:32
it would definitely make sense ... here "a" and "b" column is a rank for features that I got using different ML models and I want to take a mean of it so I would come to know what features rank overall good . and if the rank ties, then I want to apply this condition which specified, so the one having greater sum should show up. @KarlKnechtel — Amit, Apr 30 '20 at 07:40
That's y I want to write generic code so that It could handle situation like this. @AlexandreB. — Amit, Apr 30 '20 at 09:16
then the one with the greatest sum will be (mean -0.2 )and the 2nd on will be (mean -0.1 ) and the last one remains unchanged. @AlexandreB. — Amit, Apr 30 '20 at 12:08

Alexandre B. · Accepted Answer · 2020-04-30T13:15:06.853

You can try mean and cumcount:

df.assign(mean = df[["a", "b"]].mean(axis=1))\
  .assign(mean = df["mean"].subtract(df.groupby("mean").cumcount().divide(10)))

output

#    Name  Sum  a  b  mean
# 0  hamm   34  2  2   2.0
# 1   jam   54  1  1   1.0
# 2   tan   36  3  1   1.9
# 3   pan   39  4  4   4.0

Explanations:

Compute the mean using mean. We specify axis=1 to compute it on rows.
For each identical mean, we want to substract n*0.1.
1. We use groupby to group all rows with same mean
2. We get their number using cumcount. See this discussion for more details.
3. Divide by 10 using divide in order to convert the counter to 0.1, 0.2, ...
Subtract the output from step 2 to the mean column using subtract

Full code + illustration


# Step 1
df["mean"] = df[["a", "b"]].mean(axis=1)
print(df)
#    Name  Sum  a  b  mean
# 0  hamm   34  2  2   2.0
# 1   jam   54  1  1   1.0
# 2   tan   36  3  1   2.0
# 3   pan   39  4  4   4.0

# Step 2.1 + 2.2
print(df.groupby("mean").cumcount())
# 0    0
# 1    0
# 2    1
# 3    0
# dtype: int64

# Step 2.3
print(df.groupby("mean").cumcount().divide(10))
# 0    0.0
# 1    0.0
# 2    0.1
# 3    0.0
# dtype: float64

# Step 3
df["mean"] = df["mean"].subtract(df.groupby("mean").cumcount().divide(10))
print(df)
#    Name  Sum  a  b  mean
# 0  hamm   34  2  2   2.0
# 1   jam   54  1  1   1.0
# 2   tan   36  3  1   1.9
# 3   pan   39  4  4   4.0

its doesn't applicable to the problem statement. for ex . in this particular problem it doesn't apply for df.mean ..row 0 & 2. — Amit, Apr 30 '20 at 08:08

How to get the mean value of a columns having condition which specified?

1 Answers1