Plotting CDF for ranking distribution

Question

I have a panda dataframe that looks like this, this is generated with the groupby command and then sorted by # of users to give me user count for top X feature combination.

count_28day,  Feature1,   Feature2,  Feature3
5000           a1           b1         c1
1000           a2           b2         c2
50             a3           b3         c3

I'm trying to plot cdf of user distribution. I don't need to know the features. I just want to show the top X feature combinations that will give me 90% of total user.

I'm doing this in a very hacky way.

topx = table.count_28day.sort_values(ascending=False).cumsum()/sum(table.count_28day)
ser_cdf = pd.Series(topx.tolist()[1:100], index=pd.Series(range(1,100)))
ser_cdf.plot(drawstyle='steps')

Is there a more elegant way to do this using histogram or ecdf or something?

Does this answer your question? [Plot CDF + cumulative histogram using Seaborn Python](https://stackoverflow.com/questions/39297523/plot-cdf-cumulative-histogram-using-seaborn-python) — null, Dec 26 '19 at 13:48
Thanks for the link. That post show how to plot cdf from raw data. Here I have aggregated ranked data. I do not need to create any bin, I just need to plot the cumulative percentage for each rank. — vagavince, Dec 27 '19 at 05:21

Plotting CDF for ranking distribution

0 Answers0