Pandas: Adding summary information to new columns in a groupby frame

Question

Working on a class assignment.

Our current dataset has information that looks like:

    Item ID      Item Name                                  Price
0   108          Extraction, Quickblade Of Trembling Hands  3.53
1   143          Frenzied Scimitar                          1.56
2   92           Final Critic                               4.88
3   100          Blindscythe                                3.27
4   131          Fury                                       1.44

We were asked to group by two values, which I've done.

item_df = popcolumns_df.groupby(["Item ID","Item Name"])

I'm having issues though, trying to append the groupby functions to this dataframe. For instance, when I run count, the count replaces the price. Attempt one just replaced all the data in the price column with the counts.

item_counts = item_df.count().reset_index()

Output:

    Item ID     Item Name           Price
0   0           Splinter             4
1   1           Crucifer             3
2   2           Verdict              6
3   3           Phantomlight         6
4   4           Bloodlord's Fetish   5

Attempt 2 did the same:

item_counts = item_df.size().reset_index(name="Counts")

My desired output is:

     Item ID    Item Name                Price    Count   Revenue
0    108        Extraction, Quickblade   3.53     12      42.36
1    143        Frenzied Scimitar        1.56     3        4.68
2    92         Final Critic             4.88     2        9.76
3    100        Blindscythe              3.27     1        3.27
4    131        Fury                     1.44     5        7.20

I would likely just use a sum on the groups to get the revenue. I've been stumped on this for a couple of hours, so any help would be greatly appreciated!

In your desired output, where should the Count column content come from ? — LoneWanderer, Dec 17 '18 at 01:00

Edgar Ramírez Mondragón · Accepted Answer · 2018-12-17T03:36:03.380

0

If the prices for any two equivalent items is the same, then you could include "Price" in your grouping, and then compute the group sizes:

summary = popcolumns_df \
    .groupby(["Item ID", "Item Name", "Price"]) \
    .size() \
    .rename("Count") \
    .reset_index()

summary['Revenue'] = summary['Count'] * summary['Price']

The call to pd.Series.rename makes the column in the final dataframe be named "Count".

edited Dec 17 '18 at 03:36

answered Dec 17 '18 at 03:29

Edgar Ramírez Mondragón

2,912
3
19
34

Thank you! I chose this method since I'm more familiar with the .size command and it worked like a charm. I didn't think to include price, but there's no reason not to! – Betsy Shane Dec 17 '18 at 04:41

score 0 · Answer 2 · answered Dec 30 '18 at 19:52

I think you’re looking for the transform method of the groupby. That returns aggregate metrics at the original level of your data.

For example, to create a new column in your original data for the count of some grouping:

df['group_level_count'] = df.groupby(['foo', 'bar']).transform('count')  # or 'size' I think, depending whether you want to count NaNs

Related: * How to count number of rows per group (and other statistics) in pandas group by? * https://pandas.pydata.org/pandas-docs/stable/groupby.html#transformation

Pandas: Adding summary information to new columns in a groupby frame

2 Answers2