5

I created a data frame and grouped and aggregated timestamp to give me min and max value for each grouping the resulting data frame looks like this DF is defined to be patient_id, timestamp I grouped the DF by patient_id and then I wanted to get the min and max timestamp for each groups and I did this

bypatient_date = pd.DataFrame(byencounter.agg({'timestamp' : [np.min,np.max]})).reset_index())

  patient_id  timestamp            
              amin        amax
0         19  3396-08-21  3396-08-25
1         99  2723-09-27  2727-03-17
2       3014  2580-12-02  2581-05-01
3      24581  3399-07-19  3401-04-13

I am trying to find the difference between the min and max of each patient_id but I am having issue trying to access the values in timestamp amin and timestamp amax Is there a way to do this without looping but using built-in pandas or numpy

latorrefabian
  • 1,077
  • 3
  • 9
  • 19
E B
  • 1,073
  • 3
  • 23
  • 36

1 Answers1

7

Simply drop the outermost level of column index as shown in this SO post. Then amin and amax are accessible as their own columns where you can take the difference:

bypatient_date = pd.DataFrame(byencounter.groupby('patient_id').\
                              agg({'timestamp' : [np.min, np.max]})).reset_index(drop=True)

bypatient_date.columns = bypatient_date.columns.droplevel(0)
bypatient_date['datediff'] = bypatient_date['amax'] - bypatient_date['amin']
Community
  • 1
  • 1
Parfait
  • 104,375
  • 17
  • 94
  • 125