0

I have a dataframe like this:

title email_sent click_rate
abc 10 0.1
abc 2 0.2
bcd 3 0.4

I want to combine 2 records with the same title abc, email sent will be sum of 10 and 2, click_rate will be (0.1 * 10+0.2 * 2)/12

the desired df looks like:

title email_sent click_rate
abc 12 0.12
bcd 3 0.4
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
Kay Trinh
  • 3
  • 1

1 Answers1

1

You can use .apply() here.

Just define a function that takes in a sub-dataframe (for each unique title that you have), and returns a Series with the calculated values. This Series will become the row. So the index values of this Series will become the column names of the new data frame:

def f(df):
    n_emails = df['email_sent'].sum()
    click_rate = (df['click_rate'] * df['email_sent']).sum() / n_emails
    return pd.Series([n_emails, click_rate], 
                     index=['email_sent', 'click_rate'])

new_df = df.groupby('title').apply(f).reset_index()

Example:

df = pd.DataFrame([['abc', 10, 0.1], 
                   ['abc', 2,  0.2], 
                   ['bcd', 3,  0.4]], 
                  columns=['title', 'email_sent', 'click_rate'])

Result:

  title  email_sent  click_rate
0   abc        12.0    0.116667
1   bcd         3.0    0.400000
Vladimir Fokow
  • 3,728
  • 2
  • 5
  • 27