I have a panda dataframe that looks like this, this is generated with the groupby command and then sorted by # of users to give me user count for top X feature combination.
count_28day, Feature1, Feature2, Feature3
5000 a1 b1 c1
1000 a2 b2 c2
50 a3 b3 c3
I'm trying to plot cdf of user distribution. I don't need to know the features. I just want to show the top X feature combinations that will give me 90% of total user.
I'm doing this in a very hacky way.
topx = table.count_28day.sort_values(ascending=False).cumsum()/sum(table.count_28day)
ser_cdf = pd.Series(topx.tolist()[1:100], index=pd.Series(range(1,100)))
ser_cdf.plot(drawstyle='steps')
Is there a more elegant way to do this using histogram or ecdf or something?