groupby, filter, summarise in python?

Question

I am having a difficult time trying to use the python's agg function - which as I have learned - is similar to R's summarise function

I have the following dataset:

ID   Date     Qtr    Price    Fee_Rate
 1   1/1/10    1      10        1.002 
 1   1/2/10    1      10.3      1.002 
 1   1/3/10    1      10.4      1.002
 2   1/1/10    1      25        .987
 2   1/2/10    1      23.4      .987
...    ...    ...     ...       ...
 1   4/1/10    2       12.4      1.09
 1   4/2/10    2       12.5      1.09

and so on..

Essentially - I want to group by Quarter, filter to the first date of the quarter, and summarise (Price*Fee_Rate)

In R - the code is as below

 df %>% group_by(Qtr) %>% filter(Date == min(Date) %>% summarise( L_Value = sum(Price*Fee_Rate))

How would I go replicating this syntax in Python?

This is what I have tried so far:

df.groupby('Qtr').head(1).agg({'L_Value' : ('Price'*'Fee_Rate').sum())})

but it doesn't work, giving the following error:

can't multiply sequence by non-int of type 'str'

Which I assume is because of 'Price' * 'Fee_Rate'..

Thanks!

Possible duplicate of [Calculations within pandas aggregate](https://stackoverflow.com/questions/30513632/calculations-within-pandas-aggregate) — Bill the Lizard, Feb 19 '19 at 18:22

koPytok · Accepted Answer · 2019-02-19T18:42:34.663

My test dataset:

s = """ID   Date     Qtr    Price    Fee_Rate
 1   1/1/10    1      10        1.002 
 1   1/2/10    1      10.3      1.002 
 1   1/3/10    1      10.4      1.002
 2   1/1/10    1      25        .987
 2   1/2/10    1      23.4      .987
 1   4/1/10    2       12.4      1.09
 1   4/2/10    2       12.5      1.09"""

# Read df from string
df = pd.read_csv(pd.compat.StringIO(s), sep="\s+")

First convert Date column into datetime format:

df["Date"] = pd.to_datetime(df["Date"])

Requested calculations:

# Min date per Qtr
min_dt = df.groupby("Qtr")["Date"].transform(min)

# Compare each date with min dates
only_first_dates = df[df["Date"] == min_dt].copy()

# Calculate new column
only_first_dates["new_col"] = only_first_dates.eval("Price * Fee_Rate")

# Groupby and sum
only_first_dates.groupby("Qtr")["new_col"].sum()

This only works for just 1 ID on the first date per quarter, I want it to include all the ID's on the first date @koPytok — yungpadewon, Feb 19 '19 at 18:32

score 0 · Answer 2 · answered Feb 19 '19 at 18:24

0

It's not a one liner, but assuming I understand your task, it gets the result:

df_grouped = df.groupby('Qtr').head(1)
df_grouped['L_value'] = df_grouped['Price'].mul(df_grouped['Fee_Rate'])

Check out the mul page too.

answered Feb 19 '19 at 18:24

Alex L

470
6
15

Wouldn't df_grouped = df.groupby('Qtr').head(1) only store 1 ID for the first date.. what if I wanted all of the ID's on the first date? @Alex L – yungpadewon Feb 19 '19 at 18:29

score 0 · Answer 3 · answered Mar 16 '22 at 11:10

0

I know this was three years ago, but I also wanted to know the answer and I've tested below which seems to work

df[(df['ColumnToFilterOn'] == filterValue)].groupby('ColumnToGroupBy').sum()

answered Mar 16 '22 at 11:10

Maloccoda

1

groupby, filter, summarise in python?

3 Answers3