2

I am performing a bunch of aggregate stats on a groupby data frame. For one column in particular, ios_id, I would like a count and a distinct count. I'm not sure how o output this to two seaparate columns with different names. As of right now, the distinct count just overwrites the count.

How do I output both the distinct count and the count for the ios_id column to two separate columns?

df_new = df.groupby('video_id').agg({"ios_id": np.count_nonzero,
                                     "ios_id": pd.Series.nunique,
                                     "feed_position": np.average,
                                     "time_watched": np.sum,
                                     "video_length": np.sum}).sort('ios_id', ascending=False)
metersk
  • 11,803
  • 21
  • 63
  • 100
  • `ios_id` is a reference to the column on which to perform the statistic on. If I change the names then there is nothing to reference. – metersk May 30 '15 at 16:12

1 Answers1

1

Something like this should work. Note the nested dictionary structure for iOS_id.

df_new = df.groupby('video_id').agg({"ios_id": {"count": "count",
                                                "distinct": "unique"},
                                     "feed_position": np.average,
                                     "time_watched": np.sum,
                                     "video_length": np.sum})

For more details, please refer to Naming returned columns in Pandas aggregate function:

Community
  • 1
  • 1
Alexander
  • 105,104
  • 32
  • 201
  • 196