3

I have the following table:

    A   B   C   D
75987   1   0   0
75987   1   1   1
75987   2   1   1
75987   2   2   1
75987   2   6   4
75987   1   6   2
75987   1   6   1
59221   2   18  4
59221   1   18  0
59221   2   18  1

I am trying to create column E based upon the following criteria:

  1. Groupby columns A and B;
  2. When column C is less than or equal to 6, sum the corresponding values in column D;
  3. Assign these summed values to column E

I would expect the values in column E to be as below:

   A    B   C   D   E
75987   1   0   0   4
75987   1   1   1   4
75987   2   1   1   6
75987   2   2   1   6
75987   1   6   2   4
75987   1   6   1   4
75987   2   6   4   6
59221   2   18  4   0
59221   1   18  0   0
59221   2   18  1   0

There are suggestions on how to do implement groupby, boolean indexing and .query from this forum to solve similar issues but I can't seem to adapt them to get it to work as expected. Here for example, but there are plenty others I have tried too.

My attempt below seems logical but I'm still having difficulties.

df['E'] = np.where(df.groupby(['A','B'])(['C'] <= 6))['D'].transform('sum')

WillB
  • 35
  • 4

1 Answers1

4

Here is a way that works, similar to your attempt. The idea is to replace the values in D by 0 where the column C is over 6. then groupby.transform with the sum.

df['E'] = (
    df['D'].where(df['C'].le(6), other=0)
      .groupby([df['A'], df['B']])
      .transform(sum)
)
print(df)
#        A  B   C  D  E
# 0  75987  1   0  0  4
# 1  75987  1   1  1  4
# 2  75987  2   1  1  6
# 3  75987  2   2  1  6
# 4  75987  2   6  4  6
# 5  75987  1   6  2  4
# 6  75987  1   6  1  4
# 7  59221  2  18  4  0
# 8  59221  1  18  0  0
# 9  59221  2  18  1  0
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
Ben.T
  • 29,160
  • 6
  • 32
  • 54