0

I am having a difficult time trying to use the python's agg function - which as I have learned - is similar to R's summarise function

I have the following dataset:

ID   Date     Qtr    Price    Fee_Rate
 1   1/1/10    1      10        1.002 
 1   1/2/10    1      10.3      1.002 
 1   1/3/10    1      10.4      1.002
 2   1/1/10    1      25        .987
 2   1/2/10    1      23.4      .987
...    ...    ...     ...       ...
 1   4/1/10    2       12.4      1.09
 1   4/2/10    2       12.5      1.09

and so on..

Essentially - I want to group by Quarter, filter to the first date of the quarter, and summarise (Price*Fee_Rate)

In R - the code is as below

 df %>% group_by(Qtr) %>% filter(Date == min(Date) %>% summarise( L_Value = sum(Price*Fee_Rate))

How would I go replicating this syntax in Python?

This is what I have tried so far:

df.groupby('Qtr').head(1).agg({'L_Value' : ('Price'*'Fee_Rate').sum())}) 

but it doesn't work, giving the following error:

can't multiply sequence by non-int of type 'str'

Which I assume is because of 'Price' * 'Fee_Rate'..

Thanks!

Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
yungpadewon
  • 237
  • 3
  • 10
  • Possible duplicate of [Calculations within pandas aggregate](https://stackoverflow.com/questions/30513632/calculations-within-pandas-aggregate) – Bill the Lizard Feb 19 '19 at 18:22

3 Answers3

1

My test dataset:

s = """ID   Date     Qtr    Price    Fee_Rate
 1   1/1/10    1      10        1.002 
 1   1/2/10    1      10.3      1.002 
 1   1/3/10    1      10.4      1.002
 2   1/1/10    1      25        .987
 2   1/2/10    1      23.4      .987
 1   4/1/10    2       12.4      1.09
 1   4/2/10    2       12.5      1.09"""

# Read df from string
df = pd.read_csv(pd.compat.StringIO(s), sep="\s+")

First convert Date column into datetime format:

df["Date"] = pd.to_datetime(df["Date"])

Requested calculations:

# Min date per Qtr
min_dt = df.groupby("Qtr")["Date"].transform(min)

# Compare each date with min dates
only_first_dates = df[df["Date"] == min_dt].copy()

# Calculate new column
only_first_dates["new_col"] = only_first_dates.eval("Price * Fee_Rate")

# Groupby and sum
only_first_dates.groupby("Qtr")["new_col"].sum()
koPytok
  • 3,453
  • 1
  • 14
  • 29
0

It's not a one liner, but assuming I understand your task, it gets the result:

df_grouped = df.groupby('Qtr').head(1)
df_grouped['L_value'] = df_grouped['Price'].mul(df_grouped['Fee_Rate'])

Check out the mul page too.

Alex L
  • 470
  • 6
  • 15
  • Wouldn't df_grouped = df.groupby('Qtr').head(1) only store 1 ID for the first date.. what if I wanted all of the ID's on the first date? @Alex L – yungpadewon Feb 19 '19 at 18:29
0

I know this was three years ago, but I also wanted to know the answer and I've tested below which seems to work

df[(df['ColumnToFilterOn'] == filterValue)].groupby('ColumnToGroupBy').sum()