I am having a performance issue while implementing iterrows()
.
My code is something like
for _, row in df.iterrows():
row["new_col"] = \
df.apply(lambda x:some_func(row["col1"], ...), axis=1)
some_func()
is a kind of complicated function and cannot take input as Series
and DataFrame
requiring some specific value from the same row.
However, increasing number of rows increases time to process data exponentially, not linearly.
Is there some suggestion on how to speed it up? Probably splitting into smaller groups may improve or using something else instead of iterrow()
.
Any comment is appreciated.
EDIT 1.
for count, row in df.iterrows():
df.loc[count, "new_col"] = some_func(row["col1"], ... )