0

I'm trying to create a new column based on conditions from others columns. I have this dataframe:

number, flag_new, flag_math
1, TRUE, TRUE
2, FALSE, TRUE
3, TRUE, FALSE

If the flag_new is True and the flag_math is also TRUE I want to have 1 on the new column. If the flag_new is FALSE and the flag_math is TRUE I want to add 0 else -1.

The expected result is:

number, flag_new, flag_math, new_Column
    1, TRUE, TRUE, 1
    2, FALSE, TRUE, 0
    3, TRUE, FALSE, -1

For that I have this code:

df['new_col'] = np.where(df['flag_new'] == 'TRUE' and df['flag_math'] == 'TRUE',1,
    np.where(df['flag_new'] == 'FALSE' and df['flag_math'] == 'TRUE',0, -1))

But I am getting the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What I am doing wrong?

Hamza
  • 5,373
  • 3
  • 28
  • 43
Pedro Alves
  • 1,004
  • 1
  • 21
  • 47
  • `pandas` and `numpy` use the bitwise operators (&,|,~) to do boolean ops instead of `and` and `or` and `not` – Asish M. Dec 24 '20 at 17:04
  • alternatively you can use `np.where(df['flag_math'].eq('TRUE'), df['flag_new'].eq('TRUE').astype(int), -1)` as well – Asish M. Dec 24 '20 at 17:20

1 Answers1

0

and and or are intended for scalar operations. Use numpy functions i.e. np.logical_and() and np.logical_or() for vector comparisons like your case. e.g.:

df['new_col'] = np.where(np.logical_and(df['flag_new'] == 'TRUE', df['flag_math'] == 'TRUE'),1,
    np.where(np.logical_and(df['flag_new'] == 'FALSE', df['flag_math'] == 'TRUE'),0, -1))

Honestly though this logic in itself is a complex way to achieve this. You can simplify this in two steps and utilize these functions more efficiently.

A good way might be to parse columns as boolean values themselves (if they are not already) It will be helpful in any future analyses based on boolean values and if(value) is always more desirable than if value=='TRUE'.

def get_bool(key): #function derived from: https://stackoverflow.com/a/64380962/7212929 
    return {'True':True, 'False':False}.get(key) if  key.title() in {'True':True, 'False':False}.keys() else bool(key)

df[['flag_new', 'flag_math']] = df[['flag_new', 'flag_math']].applymap(get_bool) # convert boolean representations into actual bools

df['new_col'] = df.flag_math -1 +np.logical_and(df.flag_new, df.flag_math)  # add new column with simplified new logic

result:

    number  flag_new    flag_math   new_col
0   1   True    True    1
1   2   False   True    0
2   3   True    False   -1
Hamza
  • 5,373
  • 3
  • 28
  • 43