2

For example, here is a DataFrame:

df = pd.DataFrame({'year': ['2019', '2019', '2019', '2019', '2020', '2020', '2020'],
                   'key': ['a', 'a', 'b', 'c', 'd', 'e', 'f'],
                   'val': [3, 4, 3, 5, 6, 1, 2]})

It looks like

    year    key val
0   2019    a   3
1   2019    a   4
2   2019    b   3
3   2019    c   5
4   2020    d   6
5   2020    e   1
6   2020    f   2

What I want to obtain is

year  key    mean_except_current_key
2019  a      4
      b      4
      c      3.33
2020  d      1.5
      e      4
      f      3.5

That is, group df by year and key, and mean_except_current_key is defined as mean of val over year except all rows with the same key as current row.

I hope I have made this problem clear. But I can't figure out it. And I have found this question. However, it is different from mine.

Thanks for any help.

North
  • 23
  • 4
  • Can you please elaborate on how to calculate `mean_except_current_key` with one simple example. Just show how you got `4` for year 2019 and key `a` – Ch3steR May 30 '20 at 07:40
  • @Ch3steR It is the mean of `val` in year 2019 and key `b` (i.e., 3) and `val` in year 2019 and key `c` (i.e., 5). – North May 30 '20 at 07:45
  • Yea, got it. Understood. – Ch3steR May 30 '20 at 07:47

1 Answers1

3

Not sure if there is a better way, but here is some simple math using multiple groupby and transform:

x = df.groupby("year")["val"].transform(sum)- df.groupby(["year","key"])["val"].transform(sum)
y = df.groupby("year")["key"].transform("count").sub(df.groupby(["year","key"])["val"].transform("count"))
df["new"] = x/y
print (df)

   year key  val       new
0  2019   a    3  4.000000
1  2019   a    4  4.000000
2  2019   b    3  4.000000
3  2019   c    5  3.333333
4  2020   d    6  1.500000
5  2020   e    1  4.000000
6  2020   f    2  3.500000
Henry Yik
  • 22,275
  • 4
  • 18
  • 40