I have a data frame and I would like to group it by a particular column (or, in other words, by values from a particular column). I can do it in the following way: grouped = df.groupby(['ColumnName'])
.
I imagine the result of this operation as a table in which some cells can contain sets of values instead of single values. To get a usual table (i.e. a table in which every cell contains only one a single value) I need to indicate what function I want to use to transform the sets of values in the cells into single values.
For example I can replace sets of values by their sum, or by their minimal or maximal value. I can do it in the following way: grouped.sum()
or grouped.min()
and so on.
Now I want to use different functions for different columns. I figured out that I can do it in the following way: grouped.agg({'ColumnName1':sum, 'ColumnName2':min})
.
However, because of some reasons I cannot use first
. In more details, grouped.first()
works, but grouped.agg({'ColumnName1':first, 'ColumnName2':first})
does not work. As a result I get a NameError: NameError: name 'first' is not defined
. So, my question is: Why does it happen and how to resolve this problem.
ADDED
Here I found the following example:
grouped['D'].agg({'result1' : np.sum, 'result2' : np.mean})
May be I also need to use np
? But in my case python does not recognize "np". Should I import it?