0

I have a panda dataframe that looks like this, this is generated with the groupby command and then sorted by # of users to give me user count for top X feature combination.

count_28day,  Feature1,   Feature2,  Feature3
5000           a1           b1         c1
1000           a2           b2         c2
50             a3           b3         c3

I'm trying to plot cdf of user distribution. I don't need to know the features. I just want to show the top X feature combinations that will give me 90% of total user.

I'm doing this in a very hacky way.

topx = table.count_28day.sort_values(ascending=False).cumsum()/sum(table.count_28day)
ser_cdf = pd.Series(topx.tolist()[1:100], index=pd.Series(range(1,100)))
ser_cdf.plot(drawstyle='steps')

Is there a more elegant way to do this using histogram or ecdf or something?

vagavince
  • 37
  • 1
  • 5
  • Does this answer your question? [Plot CDF + cumulative histogram using Seaborn Python](https://stackoverflow.com/questions/39297523/plot-cdf-cumulative-histogram-using-seaborn-python) – null Dec 26 '19 at 13:48
  • Thanks for the link. That post show how to plot cdf from raw data. Here I have aggregated ranked data. I do not need to create any bin, I just need to plot the cumulative percentage for each rank. – vagavince Dec 27 '19 at 05:21

0 Answers0