I didn't found answer elsewhere, so I need to ask. Probably because I don't know how to correctly name it. (English is not my origin language)
I have large datetime data frame. Time is important here. One column in df has values [Nan, 1, -1]. I need to perform quick calculation to have cumulative sum reseting when value is changing.
Example.
Time sign desire_value
2014-01-24 05:00:00 Nan Nan
2014-01-24 06:00:00 Nan Nan
2014-01-24 07:00:00 Nan Nan
2014-01-24 08:00:00 1 1
2014-01-24 09:00:00 1 2
2014-01-24 10:00:00 1 3
2014-01-24 11:00:00 -1 1
2014-01-24 12:00:00 -1 2
2014-01-24 13:00:00 -1 3
2014-01-24 14:00:00 -1 4
2014-01-24 15:00:00 -1 5
2014-01-24 16:00:00 1 1
2014-01-24 17:00:00 1 2
2014-01-24 18:00:00 1 3
2014-01-24 19:00:00 -1 1
2014-01-24 20:00:00 -1 2
2014-01-24 21:00:00 1 1
2014-01-24 22:00:00 1 2
I have working solution using function, but it is not very efficient.
df['sign_1'] = df['sign'].shift(1)
for index, row in df.iterrows():
if row.sign is None:
df.loc[line, 'desire_value'] = None
elif row.sign == row.sign_1:
acc += 1
df.loc[index, 'desire_value'] = acc
else:
acc = 1
df.loc[index, 'desire_value'] = acc
I cannot find any array based approach. I found that the best way to iterate efficiently in Python is using Cython, but is there more "Python" way to solve this?