This is an extension of this question:
I have a Pandas dataframe such as:
dfq = pd.DataFrame({'COL1': ['USER1', 'USER1','USER2','USER2','USER2','USER3'],
'COL2' : ['MONTH1','MONTH2','MONTH1','MONTH1','MONTH2','MONTH1']
})
In general, this means everytime a customer uses the service, a record is added to the table with the user ID and the month. I need to know in average how many times customer use the service per month.
I can count the month occurrences like:
dfq.groupby('COL2').count()
But, how do I get the averages from there? Or is there a better way to do this?
My desired output would be something like this:
If I count the number of groups (months) and then divide by the total number of records I can get a raw average:
testcount = dfq.groupby('COL2').count()
len(dfq)
testcount/len(dfq)*100
Which sort of gives me the answer but I find it to be a very raw process. Averages are not trustworthy, I'd like to be able to get some more stadistical information: medians and deviations for instance.
In other words, I would like what they did here but in their case they are calculating over numerical values while my values are strings. I need to get insights like: what is the median customer usage of the service per month.
I hope that is clear.
Thank you!