How to perform aggregate options on one groupby column, giving two column outputs

Question

I am performing a bunch of aggregate stats on a groupby data frame. For one column in particular, ios_id, I would like a count and a distinct count. I'm not sure how o output this to two seaparate columns with different names. As of right now, the distinct count just overwrites the count.

How do I output both the distinct count and the count for the ios_id column to two separate columns?

df_new = df.groupby('video_id').agg({"ios_id": np.count_nonzero,
                                     "ios_id": pd.Series.nunique,
                                     "feed_position": np.average,
                                     "time_watched": np.sum,
                                     "video_length": np.sum}).sort('ios_id', ascending=False)

`ios_id` is a reference to the column on which to perform the statistic on. If I change the names then there is nothing to reference. — metersk, May 30 '15 at 16:12

score 1 · Accepted Answer · edited May 23 '17 at 12:14

Something like this should work. Note the nested dictionary structure for iOS_id.

df_new = df.groupby('video_id').agg({"ios_id": {"count": "count",
                                                "distinct": "unique"},
                                     "feed_position": np.average,
                                     "time_watched": np.sum,
                                     "video_length": np.sum})

For more details, please refer to Naming returned columns in Pandas aggregate function:

How to perform aggregate options on one groupby column, giving two column outputs

1 Answers1