I have a dataframe with multiple scores and multiple dates. My goal is to bin each day into equal sized buckets (let's say 5 buckets) based on whatever score I choose. The problem is that some scores have an abundance of ties and therefore I need to first compute rank to introduce a tie-breaker criteria and then the qcut
can be applied.
The simple solution is to create a field for the rank and then do groupby('date')['rank'].transform(pd.qcut)
. However, since efficiency is key, this implies doing two expensive groupbys and I was wondering if it is possible to "chain" the two operations into one sweep.
This is the closest I got; my goal is to create 5 buckets but the qcut seems to be wrong since it is asking me to provide hundreds of labels
df_main.groupby('date')['score'].\
apply(lambda x: pd.qcut(x.rank(method='first'),
5,
duplicates='drop',
labels=lbls)
)
Thanks