I have a big pandas dataframe. It has thousands of columns and over a million rows. I want to calculate the difference between the max value and the min value row-wise. Keep in mind that there are many NaN values and some rows are all NaN values (but I still want to keep them!).
I wrote the following code. It works but it's time consuming:
totTime = []
for index, row in date.iterrows():
myRow = row.dropna()
if len(myRow):
tt = max(myRow) - min(myRow)
else:
tt = None
totTime.append(tt)
Is there any way to optimize it? I tried with the following code but I get an error when it encounters all NaN rows:
tt = lambda x: max(x.dropna()) - min(x.dropna())
totTime = date.apply(tt, axis=1)
Any suggestions will be appreciated!