First post: I apologize in advance for sloppy wording (and possibly poor searching if this question has been answered ad nauseum elsewhere - maybe I don't know the right search terms yet).
I have data in 10-minute chunks and I want to perform calculations on a column ('input') grouped by minute (i.e. 10 separate 60-second blocks - not a rolling 60 second period) and then store all ten calculations in a single list called output.
The 'seconds' column records the second from 1 to 600 in the 10-minute period. If no data was entered for a given second, there is no row for that number of seconds. So, some minutes have 60 rows of data, some have as few as one or two.
Note: the calculation (my_function) is not basic so I can't use groupby and np.sum(), np.mean(), etc. - or at least I can't figure out how to use groupby.
I have code that gets the job done but it looks ugly to me so I am sure there is a better way (probably several).
output=[]
seconds_slicer = 0
for i in np.linspace(1,10,10):
seconds_slicer += 60
minute_slice = df[(df['seconds'] > (seconds_slicer - 60)) &
(df['seconds'] <= seconds_slicer)]
calc = my_function(minute_slice['input'])
output.append(calc)
If there is a cleaner way to do this, please let me know. Thanks!
Edit: Adding sample data and function details:
seconds input
1 1 0.000054
2 2 -0.000012
3 3 0.000000
4 4 0.000000
5 5 0.000045
def realized_volatility(series_log_return):
return np.sqrt(np.sum(series_log_return**2))