I have been searching through the web whether there is a simple method when using python/pandas to get a dataframe consisting only the unique rows and their basic stats (occurences, mean, and so on) from an original dataframe.
So far my efforts came only half way: I found how to get all the unique rows using
data.drop_duplicates
But then Im not quite sure how I should retrieve all the stats I desire easily. I could do a for loop on a groupedby, but that would be rather slow.
Another approach that I thought of was using the groupby and then use describe, e.g.,
data.groupby(allColumns)[columnImInterestedInForStats].describe()
But it turns out that this, for 19 columns in allColumns, only returns me one row with no stats at all. Surprisingly, if I choose only a small subset for allColumns, I actually do get each unique combination of the subset and all their stats. My expectation was that if I fill in all 19 columns in groupby() I would get all unique groups?
Data example:
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), ['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3']
Desired result:
col2 col3 mean count and so on
A 1 1.1 1
3 4.8 3
B 2 6.0 2
4 2.5 1
5 5.2 2
6 3.4 1
C 3 3.4 1
D 1 5.5 3
into a dataframe.
Im sure it must be something very trivial that Im missing, but I cant find the proper answer. Thanks in advance.