I have a very large pandas dataframe (few million rows) that I am manipulating. The last column I calculate uses the following code:
df['diff'] = df.apply(lambda row: row.col_a - row.col_b, axis=1)
It is fifty-fifty if the code runs, and if it does, it takes the better part of an hour. Is there a way in pandas to better run. I've started to do some research, I looked at this stackoverflow page (Why is pandas apply lambda slower than loop here?), but it is for categorical data. I've done some research on Vectorized Operations, but haven't found anything that I think will work. Any help is appreciated.