-1

I have a dataframe

import pandas as pd
import numpy as np

data = pd.DataFrame({"col1": [0, 1, 1, 1,1, 0],
                     "col2": [False, True, False, False, True, False]
                     })

data

I'm trying to create a column col3 where col1=1 and col2==True its 1 else 0

Using np.where:

data.assign(col3=np.where(data["col1"]==1 & data["col2"], 1, 0))
col1    col2    col3
0   0   False   1
1   1   True    1
2   1   False   0
3   1   False   0
4   1   True    1
5   0   False   1

For row 1: col1==0 & col2=False, but I'm getting col3 as 1.

What am I missing??

The desired output:


col1    col2    col3
0   0   False   0
1   1   True    1
2   1   False   0
3   1   False   0
4   1   True    1
5   0   False   0
Ailurophile
  • 2,552
  • 7
  • 21
  • 46

1 Answers1

1

You are missing parentheses (& has higher precedence than ==):

data.assign(col3=np.where((data["col1"]==1) & data["col2"], 1, 0))

A way to avoid this is to use eq:

data.assign(col3=np.where(data["col1"].eq(1) & data["col2"], 1, 0))

You can also replace the numpy.where by astype:

data.assign(col3=((data["col1"]==1) & data["col2"]).astype(int))

Output:

   col1   col2  col3
0     0  False     0
1     1   True     1
2     1  False     0
3     1  False     0
4     1   True     1
5     0  False     0
mozway
  • 194,879
  • 13
  • 39
  • 75