1

I am trying to add a column to my dataframe that calculates Delta. Based on the name in the column 'Name', if that name is in the list, it will calculate df['A'] - df['B'], if the name is not in the list, calculation is df['B'] - df['A'].

Here is what I have:

for i in list1:
    
     df['Delta'] = np.where(df['Name'] == i, np.maximum(0, df['A'] - df['B']), np.maximum(0, df['B'] - df['A']))

The problem is that it goes trough each i separately and rewrites all the i's it did before.

How can i rewrite this code, so that it doesn't go through each i, but instead just checks if df['Name'] equals to any of the i's?

Something like:

df['Delta'] = np.where(df['Name'] == any(list1), np.maximum(0, df['A'] - df['B']), np.maximum(0, df['B'] - df['A']))

If there is an overall better way to do this, please let me know.

Mari
  • 155
  • 1
  • 9

1 Answers1

1

Use Series.isin to create a boolean mask then use np.where along with this mask to select values from choices based on this mask:

diff = df['A'].sub(df['B'])
df['Delta'] = np.where(df['Name'].isin(list1), np.maximum(0, diff), np.maximum(0, -diff))

Example:

np.random.seed(10)

list1 = ['a', 'c']
df = pd.DataFrame({'Name': np.random.choice(['a', 'b', 'c'], 5), 'A': np.random.randint(1, 10, 5), 'B': np.random.randint(1, 10, 5)})

diff = df['A'].sub(df['B'])
df['Delta'] = np.where(df['Name'].isin(list1), np.maximum(0, diff), np.maximum(0, -diff))

Result:

# print(df)
  Name  A  B  Delta
0    b  1  7      6
1    b  2  5      3
2    a  9  4      5
3    a  1  1      0
4    b  9  5      0
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53