4

I have a df as:

df:
        col1       col2       col3        col4        col5
0       1.36       4.31       7.66           2           2
1       2.62       3.30       2.48           2           1
2       5.19       3.58       1.62           0           2
3       2.06       3.16       3.50           1           1
4       2.19       2.98       3.38           1           1

I want

col6 to return 1 when (col4 > 1 and col5 > 1) else 0

and

col7 to return 1 when (col4 > 1 and col5 > 1 and col 4 + col5 > 2) else 0

I am trying

df.loc[df['col4'] > 0, df['col5'] > 0, 'col6'] = '1'

however I am getting the error:

File "pandas\_libs\index.pyx", line 269, in pandas._libs.index.IndexEngine.get_indexer
  File "pandas\_libs\hashtable_class_helper.pxi", line 5247, in pandas._libs.hashtable.PyObjectHashTable.lookup
TypeError: unhashable type: 'Series'

How can I perform this operation?

  • Please see this post https://stackoverflow.com/questions/22591174/pandas-multiple-conditions-while-indexing-data-frame-unexpected-behavior – le_camerone Jul 24 '21 at 02:40

5 Answers5

2

When you do a bitwise operation on Series objects or arrays, you get an array of booleans, each of whose elements is True or False. Those are basically 0 or 1, and in fact more convenient in most cases:

df['col6'] = (df['col4'] > 1) & (df['col5'] > 1)
df['col7'] = df['col6']

That last one is not a clever trick. If two numbers are both >1, then of course their sum must be >2. If you absolutely want the integers 0 and 1 instead of booleans, use Series.astype:

df['col6'] = ((df['col4'] > 1) & (df['col5'] > 1)).astype(int)
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
2

We can do something like eval

df['col6'] = df.eval('col4 > 1 and col5 > 1').astype(int)
df['col7'] = df.eval('col4 > 1 and col5 > 1 and col4 + col5 > 2').astype(int)
BENY
  • 317,841
  • 20
  • 164
  • 234
1

try:

c1=df['col4'].gt(1) & df['col5'].gt(1)
#your 1st condition
c2=c1 & df['col4'].add(df['col5']).gt(2)
#your 2nd condition

Finally:

df['col6']=c1.astype(int)
df['col7']=c2.astype(int)

OR

via numpy's where() method:

c1=df['col4'].gt(1) & df['col5'].gt(1)
c2=c1 & df['col4'].add(df['col5']).gt(2)


df['col6']=np.where(c1,1,0)
df['col7']=np.where(c2,1,0)
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
1

You can simple use bitwise operators:

df['col6'] = ((df["col4"]>1) & (df["col5"]>1))*1
df['col7'] = ((df["col4"]>1) & (df["col5"]>1) & (df['col4']+df['col5']>2))*1

>>> df
   col1  col2  col3  col4  col5  col6  col7
0  1.36  4.31  7.66     2     2     1     1
1  2.62  3.30  2.48     2     1     0     0
2  5.19  3.58  1.62     0     2     0     0
3  2.06  3.16  3.50     1     1     0     0
4  2.19  2.98  3.38     1     1     0     0
not_speshal
  • 22,093
  • 2
  • 15
  • 30
1

Create a variable to hold the columns :

columns = ['col4', 'col5']

Condition 1 :

cond1 = df.filter(columns).gt(1).all(1)

Condition 2:

cond2 = df.filter(columns).sum(1).gt(2)

Create new columns via assign:

df.assign(col6 = cond1.astype(int), 
          col7 = (cond1 & cond2).astype(int)
          )


   col1  col2  col3  col4  col5  col6  col7
0  1.36  4.31  7.66     2     2     1     1
1  2.62  3.30  2.48     2     1     0     0
2  5.19  3.58  1.62     0     2     0     0
3  2.06  3.16  3.50     1     1     0     0
4  2.19  2.98  3.38     1     1     0     0
sammywemmy
  • 27,093
  • 4
  • 17
  • 31