I have the following data
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
date_today = pd.Timestamp(1513393355.5, unit='s')
days = pd.date_range(date_today, date_today + timedelta(1), freq='s')
np.random.seed(seed=1111)
data_price = np.random.randint(2, high=10, size=len(days))
data_quantity = np.random.randint(2, high=100, size=len(days))
df = pd.DataFrame({'ts': days, 'price': data_price, 'quantity': data_quantity})
df = df.set_index('ts')
print(df.head())
price quantity
ts
2017-12-16 03:02:35.500 6 30
2017-12-16 03:02:36.500 9 18
2017-12-16 03:02:37.500 7 85
2017-12-16 03:02:38.500 3 51
2017-12-16 03:02:39.500 6 19
I would like to resample the data into 10 minute intervals, rank the observations within each 10 minute window by increasing order of price, take the top 20% of the data after ranking and then calculate the weighted average price (i.e. price weighted by the quantity), and the sum of the quantity of the top 20% of the data.
There is a solution here which uses the groupby function to calculate the weighted average price. But i would like to apply the weighted average and sum only to the top 20% of the data.
I would like to do that on a static basis (i.e. apply the pandas resample function) and on a rolling basis every 1 minute with a 10 minute lookback period.
How can I do this elegantly with pandas? I am confused on how to do the ranking within the resample window.
Thank you!