I think you can use loc
with boolean indexing
:
print (df)
A B
0 3 10
1 2 30
2 1 20
3 2 15
4 2 100
print (df['B'] > 18)
0 False
1 True
2 True
3 False
4 True
Name: B, dtype: bool
df.loc[df['B'] > 18, 'C'] = df['B']
print (df)
A B C
0 3 10 NaN
1 2 30 30.0
2 1 20 20.0
3 2 15 NaN
4 2 100 100.0
If you need select by condition use boolean indexing
:
print (df[df['B'] > 18])
A B
1 2 30
2 1 20
4 2 100
If need something more faster, use where
:
df['C'] = df.B.where(df['B'] > 18)
Timings (len(df)=50k
):
In [1367]: %timeit (a(df))
The slowest run took 8.34 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.14 ms per loop
In [1368]: %timeit (b(df1))
100 loops, best of 3: 15.5 ms per loop
In [1369]: %timeit (c(df2))
100 loops, best of 3: 2.93 ms per loop
Code for timings:
import pandas as pd
df = pd.DataFrame([{'A':3,'B':10},
{'A':2, 'B':30},
{'A':1,'B':20},
{'A':2,'B':15},
{'A':2,'B':100}])
print (df)
df = pd.concat([df]*10000).reset_index(drop=True)
df1 = df.copy()
df2 = df.copy()
def a(df):
df['C'] = df.B.where(df['B'] > 18)
return df
def b(df1):
df['C'] = ([x if x > 18 else None for x in df['B']])
return df
def c(df2):
df.loc[df['B'] > 18, 'C'] = df['B']
return df
print (a(df))
print (b(df1))
print (c(df2))