-1

I have a DataFrame that looks like this:

Image of DataFrame

What I would like to do is to compare the values in all four columns (A, B, C, and D) for every row and count the number of times in which D has the smaller value than A, B, or C for each row and add it into the 'Count' column. So for instance, 'Count' should be 1 for the second row, the third row, and 2 for the last row.

Thank you in advance!

palutuna
  • 9
  • 1
  • 5
  • 3
    Please don’t post images of code, data or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`). [Why should I not upload images of ... when asking a question?](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors-when-asking-a-question). [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – wwii Jan 11 '23 at 17:43

2 Answers2

2

You can use vectorize the operation using gt and sum methods along an axis:

df['Count'] = df[['A', 'B', 'C']].gt(df['D'], axis=0).sum(axis=1)
print(df)

# Output
   A  B  C  D  Count
0  1  2  3  4      0
1  4  3  2  1      3
2  2  1  4  3      1
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • I think your solution is better than mine due to vectorization. Thank you for posting it. I am a bit confused though - why is it that we need to specify `axis=0` with `.gt()` here? I tried it without and got some odd results that didn't make much sense. I'm still trying to wrap my head around how it's comparing in the background. – Stu Sztukowski Jan 11 '23 at 18:31
  • 1
    Because `df['A', 'B', 'C']]` return a DataFrame while `df['D']` returns a Series, so you have to indicate you want to compare along index axis. Replace `gt` by `>` and you will see you can't do any comparison. This is the same thing for `sum` in fact :-) – Corralien Jan 11 '23 at 18:37
  • 1
    I think it could be more understandable if you use numpy: `df[['A', 'B', 'C']].to_numpy() > np.vstack(df['D'].to_numpy())`. Try to remove `vstack` and now it doesn't make sense to compare. Is it clear now? (+1 for your solution) – Corralien Jan 11 '23 at 18:41
1

In the future, please do not post data as an image.

Use a lambda function and compare across all columns, then sum across the columns.

data = {'A': [1,47,4316,8511],
        'B': [4,1,3,4],
        'C': [2,7,9,1],
        'D': [32,17,1,0]
       }

df = pd.DataFrame(data)

df['Count'] = df.apply(lambda x: x['D'] < x, axis=1).sum(axis=1)

Output:

      A  B  C   D  Count
0     1  4  2  32      0
1    47  1  7  17      1
2  4316  3  9   1      3
3  8511  4  1   0      3
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21