1

I have two Dataframes, df1 and df2, built from the same dataset regarding some patient information.

df1 contains information regarding how long patients were on various IVs of the same type, so there may be multiple rows for the same patient where each row constitutes a different IV.

In df2, each row is a distinct patient and the columns show the total days that patient was on a certain type of IV.

The columns we are concerned with are 'FIN' and 'Line Days.' 'FIN' uniquely identifies the patients and 'Line Days' documents how many days that particular IV was in.

In df1, I have grouped the rows by 'FIN' and then summed the 'Line Days' columns for each distinct 'FIN.' Now, I want to take the output of this groupby.agg() and add this as a new column to df2 by matching 'FIN.'

def sum_col(df, col1, col2):
    tot_line = df.groupby(col1)[col2].agg('sum')
    return tot_line

CVL_totals = sum_col(CVL_data, 'FIN', '# Line days')

The output looks like this: (NOTE: No actual FINs shown)

FIN
000000000    12
111111111    3
222222222    32
333333333    7

Now, I want to merge this to df2 on 'FIN', but I can't do that since the output is not a dataframe.

cecilj
  • 142
  • 1
  • 8

1 Answers1

3

cvl_totals should be a Series object, which you can turn into a DataFrame using the to_frame() method, and then merge this to df2.

However it seems like you are doing a groupby on a column, so if you want to expand the values across the rows (i.e. all columns with one FIN will have the same cvl_total) then you can do as described here: Conditionally fill column values based on another columns value in pandas

linamnt
  • 1,315
  • 1
  • 12
  • 23
  • 1
    Since df2 does not repeat FIN values, the first solution works perfectly. Thank you very much! – cecilj Feb 05 '19 at 19:01