How to access the values of aggregate functions in Python

Question

I created a data frame and grouped and aggregated timestamp to give me min and max value for each grouping the resulting data frame looks like this DF is defined to be patient_id, timestamp I grouped the DF by patient_id and then I wanted to get the min and max timestamp for each groups and I did this

bypatient_date = pd.DataFrame(byencounter.agg({'timestamp' : [np.min,np.max]})).reset_index())

  patient_id  timestamp            
              amin        amax
0         19  3396-08-21  3396-08-25
1         99  2723-09-27  2727-03-17
2       3014  2580-12-02  2581-05-01
3      24581  3399-07-19  3401-04-13

I am trying to find the difference between the min and max of each patient_id but I am having issue trying to access the values in timestamp amin and timestamp amax Is there a way to do this without looping but using built-in pandas or numpy

Are those really timestamps for YYYY-MM-DD for the 26th - 34th centuries? — Parfait, Jan 22 '16 at 02:24

score 7 · Accepted Answer · edited May 23 '17 at 12:01

Simply drop the outermost level of column index as shown in this SO post. Then amin and amax are accessible as their own columns where you can take the difference:

bypatient_date = pd.DataFrame(byencounter.groupby('patient_id').\
                              agg({'timestamp' : [np.min, np.max]})).reset_index(drop=True)

bypatient_date.columns = bypatient_date.columns.droplevel(0)
bypatient_date['datediff'] = bypatient_date['amax'] - bypatient_date['amin']

How to access the values of aggregate functions in Python

1 Answers1