I have a pandas data frame made up of 3,373,612 rows. I would like to run some code on two of the columns to produce two new columns. My code throws an exception, and so to diagnose the cause I have cut back to the simplest code I can think of that takes a row and returns a series of two values:
def split_ids(row):
return pd.Series(None, None)
analytic_events.apply(split_ids, axis=1)
I am running this in a Jupyter Notebook, but even so I am shocked that after five minutes the code is still running.
I must be misunderstanding something about pandas apply function. Why is simple code taking an inordinate amount of time to run through 3 million rows in a data frame?