0

I want to create a new column in which stores boolean values when two columns (one and two) present the same value and another column (three) presents the value True.

If column three == True AND column two == column one ---> column four = True

If column three == false  ---> column four = Na

If column three == True AND column two != column one ---> column four =  False

Example dataframe:

data = [['True', 0,0], ['True', 0, 1], ['False', 0, 1]]
df = pd.DataFrame(data, columns = ['One', 'Two', ''True])

one  Two Three
True  0   0
True  0   1
False 0   1

Disable output

one Two Three Four
True  0   0   True
True  0   1   False
False 0   1   Na 
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53

2 Answers2

1

Use np.select:

Input data:

>>> df
   One  Two  Three
0    0    0   True
1    0    1   True
2    0    1  False
df['Four'] = np.select([df['Three'] & df['One'].eq(df['Two']),
                        df['Three'] & df['One'].ne(df['Two'])],
                       choicelist=[True, False],
                       default=pd.NA)

Output result:

>>> df
   One  Two  Three   Four
0    0    0   True   True
1    0    1   True  False
2    0    1  False   <NA>

You can cast the column Four to boolean dtype:

>>> df.astype({'Four': 'boolean'}).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   One     3 non-null      int64
 1   Two     3 non-null      int64
 2   Three   3 non-null      bool
 3   Four    2 non-null      boolean  # <- HERE
dtypes: bool(1), boolean(1), int64(2)
memory usage: 185.0 bytes
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

You can try with a custom function, you can modify the function based on the realistic condition you want to have, this just a walk-through approach.

Function:

def check_df(df):
  if (df['Three'] and df['One'] == (df['Two'])):
    return True
  elif (df['Three'] and df['One'] != (df['Two'])):
    return False
  else:
    return np.nan

DataFrame Sample:

print(df)
     One  Two  Three
0   True    0      0
1   True    0      1
2  False    0      1

Now use df.apply and apply the function on the axis 1.

df['newcolumn'] = df.apply(check_df, axis=1)
print(df)
     One  Two  Three newcolumn
0   True    0      0       NaN
1   True    0      1     False
2  False    0      1      True
Karn Kumar
  • 8,518
  • 3
  • 27
  • 53