I have a large dataframe and I need to loop through it. However, it takes a long time for a very large dataframe. I know iterrows is quiet slow and vectorization much faster. However, I don't know how to rewrite an iterrows loop.
My dataframe is given as follows:
print(df_toe.head(10))
z_toe dn50_toe Nod ht/h output_ok
0 -3.5 0.067171 NaN NaN 1.0
1 -3.5 0.082472 NaN NaN 1.0
2 -3.5 0.095543 NaN NaN 1.0
3 -3.5 0.196341 NaN NaN 1.0
4 -3.5 0.232024 NaN NaN 1.0
5 -3.5 0.347270 NaN NaN 1.0
6 -3.5 0.353661 NaN NaN 1.0
7 -3.5 0.404841 NaN NaN 1.0
8 -3.5 0.632502 NaN NaN 1.0
9 -3.5 0.922923 NaN NaN 1.0
With some extra parameters:
z_bed = -4.5
swl = 1.8
The iterrows loop through the dataframe df_toe is written as follows:
def dftoe_det_2nd(df_toe):
for i in df_toe.index:
'Define input variables'
z_toe = df_toe.get_value(i,'z_toe')
dn50_toe = df_toe.get_value(i,'dn50_toe')
'Define restrictions between which it can operate for z_toe/h'
h = swl - z_bed
ht = swl - z_toe
df_toe.set_value(i,'ht/h',abs(ht / h))
if z_toe < z_bed:
df_toe.set_value(i,'output_ok',0)
'Show all waterheights'
df_toe.set_value(i,'Nod',Nodtoe())
if 0.90 < abs(ht / h) or 0.4 > abs(ht / h):
df_toe.set_value(i,'output_ok',0)
if h > 25:
df_toe.set_value(i,'output_ok',0)
df_toe = df_toe[df_toe['output_ok'] == 1]
del df_toe['output_ok']
return df_toe
Does anyone know how this can be optimized in the sense of velocity and computation time?