I'm trying to count the frequency of the number of occurrences a value has been higher than today's value for all dates in the past.
I've looked into using pandas rolling and cumcount functions, but still cannot figure out how to do this efficiently.
Below is a working code snippet of what I'm trying to achieve with the output displayed in the image following. My gut says that this is an extremely ugly and inefficient approach, so I was hoping to find something better.
def freq_greater_than(r):
smaller_date = df[df.date < r.date]
larger_num = smaller_date[(smaller_date.num > r.num)]
return round(len(larger_num) / len(smaller_date) * 100, 2)
index = pd.date_range('2020-01-01', '2020-01-10')
df = pd.DataFrame(dict(num=np.random.random_integers(0, 100, len(index))), index=pd.Series(index, name='date'))
df['date'] = df.index
df['freq_greater_than'] = df.iloc[1:].apply(freq_greater_than, axis=1)
df.style.format({'freq_greater_than': '{:.2f}%'})