I have a very long and wide dataframe. I'd like to create a new column in that dataframe, where the value depends on many other columns in the df. The calculation needed for the values in this new column, ALSO change, depending on a value in some other column.
The answers to this question and this question come close, but don't quite work out for me.
I'll eventually have about 30 different calculations that could be applied, so I'm not too keen on the np.where
function, which is not that readible for too many conditions.
I've also been strongly adviced against doing a for-loop over all rows in a dataframe, because it's supposed to be awful for performance (please correct me if I'm wrong there).
What I've tried to do instead:
import pandas as pd
import numpy as np
# Information in my columns look something like this:
df['text'] = ['dab', 'def', 'bla', 'zdag', 'etc']
df['values1'] = [3 , 4, 2, 5, 2]
df['values2'] = [6, 3, 21, 44, 22]
df['values3'] = [103, 444, 33, 425, 200]
# lists to check against to decide upon which calculation is required
someList = ['dab', 'bla']
someOtherList = ['def', 'zdag']
someThirdList = ['etc']
conditions = [
(df['text'] is None),
(df['text'] in someList),
(df['text'] in someOtherList),
(df['text'] in someThirdList)]
choices = [0,
round(df['values2'] * 0.5 * df['values3'], 2),
df['values1'] + df['values2'] - df['values3'],
df['values1'] + 249]
df['mynewvalue'] = np.select(conditions, choices, default=0)
print(df)
I expect that based on the row values in the df['text']
, the right calculation is applied to same row value of df['mynewvalue']
.
Instead, I get the error The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I program this instead, so that I can use these kind of conditions to define the right calculation for this df['mynewvalue'] column?