3

I want to check if on any dataframe row a given number of columns has any of a set of values (different sets for different columns) and assign a boolean accordingly - I think I might need a combination of apply() and any() but not quite hitting it exactly:

So, for dataframe:

bank_dict = {'Name' : ['A', 'B', 'C', 'D', 'E'],
        'Type' :     ['Retail', 'Corporate', 'Corporate', 'Wholesale', 'Retail'],
        'Overdraft': ['Y', 'Y', 'Y', 'N', 'N'],
        'Forex': ['USD', 'GBP', 'EUR', 'JPY', 'GBP']}

With truth list:

truth_list = [bank_df['Type'].isin(['Retail']), bank_df['Overdraft'].isin(['Yes']), bank_df['Forex'].isin(['USD', 'GBP'])]

The resultant df should look like:

  Name       Type Overdraft Forex  TruthCol
0    A     Retail         Y   USD         1
1    B  Corporate         Y   GBP         1
2    C  Corporate         Y   EUR         1
3    D  Wholesale         N   JPY         0
4    E     Retail         N   GBP         1

Thanks,

rawwar
  • 4,834
  • 9
  • 32
  • 57
shanlodh
  • 1,015
  • 2
  • 11
  • 30

2 Answers2

5

I think need np.logical_or.reduce:

bank_df['TruthCol'] = np.logical_or.reduce(truth_list).astype(int)
print (bank_df)
  Name       Type Overdraft Forex  TruthCol
0    A     Retail         Y   USD         1
1    B  Corporate         Y   GBP         1
2    C  Corporate         Y   EUR         1
3    D  Wholesale         N   JPY         0
4    E     Retail         N   GBP         1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

Another alternate way is to put the conditions inside numpy.where:

bank_df['TruthCol'] = np.where(((bank_df['Type'] == 'Retail') | (bank_df['Overdraft'] == 'Y') | ((bank_df['Forex'] == 'USD') | (bank_df['Forex'] == 'GBP'))), 1, 0)

Output:

  Forex Name Overdraft       Type  TruthCol
0   USD    A         Y     Retail         1
1   GBP    B         Y  Corporate         1
2   EUR    C         Y  Corporate         1
3   JPY    D         N  Wholesale         0
4   GBP    E         N     Retail         1
Ankur Sinha
  • 6,473
  • 7
  • 42
  • 73