python pandas row wise substitution by condition in an elegant way

Question

I have the following problem:

Given a dataframe, say for example,

import pandas as pd
df = pd.DataFrame({'col1':[1,0,0,1],'col2':['B','B','A','A'],'col3':[1,2,3,4]})

In some other tool I can easily create a new column based on a condition, say

Create new column 'col3' with 'col2' if df['col1'] == '0' & ~df['col2'].isnull() else 'col1'

This other tool works it out pretty fast. I did not find a corresponding expression in python so far.

1.) I tried np.where which iterates through rows but does not allow dynamic values in the result corresponding to the exact row

2.) I've tried .apply(lambda ... ) which appears to be quiet slow.

I would be happy if you could find an elegant way to fix this problem. Thanx.

jezrael · Answer 1 · 2018-09-24T11:02:06.277

2

I think need numpy.where with notnull instead inverted isnull (thanks @jpp):

df = pd.DataFrame({'col1':[1,0,0,1],'col2':['B','B','A','A'],'col3':[1,2,3,4]})

df['col3'] = np.where((df['col1'] == 0) & (df['col2'].notnull()), df['col2'], df['col1'])
print (df)
   col1 col2 col3
0     1    B    1
1     0    B    B
2     0    A    A
3     1    A    1

edited Sep 24 '18 at 11:02

answered Sep 24 '18 at 10:58

jezrael

822,522
95
1,334
1,252

score 0 · Accepted Answer · answered Sep 24 '18 at 11:12

0

try this:

import numpy as np
df['new_col'] = np.where(df['col1'] == 0 & (~df['col2'].isnull()), df['col2'], df['col1'] )

np.where is faster than pd.apply: Why is np.where faster than pd.apply

answered Sep 24 '18 at 11:12

Matteo M

68
1
9

score 0 · Answer 3 · answered Sep 24 '18 at 11:20

0

You can use df.loc:

df['col3'] = df['col1']
df.loc[(df['col1'] == 0 )& (~df['col2'].isnull()), 'col3'] = df['col2']

answered Sep 24 '18 at 11:20

Andy

450
2
8

python pandas row wise substitution by condition in an elegant way

3 Answers3