I have a dataframe which I am grouping as follows and applying certain operations to particular columns:
df = df.groupby(['A', 'B', 'C']).agg({'ID': 'count', 'AMT': 'sum'})
For each groupby combination (~15) I want to randomly sample the rows belonging to each combination and return a sample ID and report it in a third output column. OR I really just want one of the IDs from the rows in that combination to appear in the table, I don't care if it is 'random' or not.
I have tried the following:
df = df.groupby(['A', 'B', 'C']).agg({'ID': 'count', 'AMT': 'sum', 'ID': 'sample'})
and received the error:
AttributeError: Cannot access callable attribute 'sample' of 'SeriesGroupBy' objects, try using the 'apply' method
So I then tried:
func = lambda x: x.sample
df = df.groupby(['A', 'B', 'C']).agg({'ID': 'count', 'AMT': 'sum', 'ID': apply(func)})
which didnt work so I tried
df = df.groupby(['A', 'B', 'C']).agg({'ID': 'count', 'AMT': 'sum', 'ID': lambda x: x.sample})
which also didn't work. I have reviewed the following links for related questions but they did not seem to help me either.
Select multiple groups from pandas groupby object
http://pandas.pydata.org/pandas-docs/stable/groupby.html
Get specific element from Groups after applying groupby - PANDAS
How to access pandas groupby dataframe by key
https://chrisalbon.com/python/pandas_apply_operations_to_dataframes.html
Any thoughts on how to handle?