1

I have this sample data set:

df_samp = pd.DataFrame({'Name': ['Bob', 'John', 'Ross'], 'Counts': [5, 4, 3]})

I want to evaluate if the Counts column is less than 5, by row, and then add a new column showing how MUCH each specific row is less than 5. E.g.,

Name   Counts   Difference
Bob    5        0
John   4        1
Ross   3        2

The below is simple, but returns the standard (and expected) True or False:

df_samp['Counts'] = df_samp['Counts'] < 5

Name   Counts
Bob    False
John   True
Ross   True

How do I take this a step further?

papelr
  • 468
  • 1
  • 11
  • 42

1 Answers1

2

Use np.where and df.abs:

import numpy as np

df_samp['Difference'] = np.where(df_samp['Counts'].le(5), (df_samp['Counts'] - 5).abs(), df_samp['Counts'])

Output:

   Name  Counts  Difference
0   Bob       5           0
1  John       4           1
2  Ross       3           2
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • For learning purposes, why is this better than using a conditional? Will mark as correct answer when it allows me – papelr Jun 08 '20 at 15:31
  • 2
    You ideally need conditions when you have multiple things to evaluate. Here, you just need the absolute difference of the column's value from 5. No need to put extra checks in the code to make it slow. – Mayank Porwal Jun 08 '20 at 15:33
  • Works, in scope of what I asked. Last thing, say Bob had 6 in the Counts col - would the there need to be a new line to deal with greater than 5? Would want the output to be zero, for greater than 5 – papelr Jun 08 '20 at 16:03
  • 1
    No. Think of `np.where` more like an `if-else` statement. If you read the code, it checks `if Counts <= 5` then `Counts-5` else `Counts`. No need for handling greater than `5` case separately. – Mayank Porwal Jun 08 '20 at 16:06
  • 1
    Perfect! Got it – papelr Jun 08 '20 at 16:07