We have some code in Python using Pandas. We iterate through each row, take a value from a column and pass that as a parameter to a function call. Currently we use iterrows() method but am looking to optimize it.
Here is my existing code:
df_input = pd.read_csv("input.csv")
df = pd.DataFrame()
for index, row in df_input.iterrows():
variable1 = do_something_1(parameter1, parameter2, row["body"])
listOfSeries = [pd.Series([row["id"], row["body"], variable1], index=['id', 'body', 'column1'] )]
df = df.append(listOfSeries, ignore_index=True)
I am trying to improve the performance of the code. I did read the thread here: Does pandas iterrows have performance issues?
I think I can use the apply method for calling the do_something_1 function on the entire dataframe but how do I save the results from the do_something_1 function to a new column in the same dataframe?