One potential solution is to map
regular functions
or lambda
functions to the columns of the dataframe, which is much more faster and efficient than a loop (e.g. df.iterrows()
).
Here is summary of efficient dataframe/series manipulation methods based on an answer here :
map
works for Series ONLY
applymap
works for DataFrames ONLY
apply
works for BOTH
` Here is a toy example :
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(4, 2), columns=list('AB'))
print(df)
def square(x):
return x**2
#mapping a lambda function
print('Result of mapping with a lambda function')
df['A'] = df['A'].map(lambda x : x**2)
print(df)
#mapping a regular function
print('Result of mapping with a regular function')
df['C'] =df['A'].map(square)
print(df)
#apply
print('Result of applymap a regular function')
df1 = df.applymap(square)
print(df1)
#apply
print('Result of applying with a regular function')
df2 = df.apply(square)
print(df2)
Output:
A B
0 -0.030899 -2.206942
1 0.080991 0.049431
2 1.190754 -0.101161
3 0.794870 -0.969503
Result of mapping with a lambda function
A B
0 0.000955 -2.206942
1 0.006560 0.049431
2 1.417894 -0.101161
3 0.631818 -0.969503
Result of mapping with a regular function
A B C
0 0.000955 -2.206942 9.115775e-07
1 0.006560 0.049431 4.302793e-05
2 1.417894 -0.101161 2.010425e+00
3 0.631818 -0.969503 3.991945e-01
Result of applymap with a regular function
A B C
0 9.115775e-07 4.870592 8.309735e-13
1 4.302793e-05 0.002443 1.851403e-09
2 2.010425e+00 0.010234 4.041807e+00
3 3.991945e-01 0.939936 1.593563e-01
Result of applying with a regular function
A B C
0 9.115775e-07 4.870592 8.309735e-13
1 4.302793e-05 0.002443 1.851403e-09
2 2.010425e+00 0.010234 4.041807e+00
3 3.991945e-01 0.939936 1.593563e-01