2

Given that i have a dataset as below:

dt = {
    "facility":["Ann Arbor","Ann Arbor","Detriot","Detriot","Detriot"],
    "patient_ID":[4388,4388,9086,9086,9086],
    "year":[2004,2007,2007,2008,2011],
    "month":[8,9,9,6,2],
    "Nr_Small":[0,0,5,12,10],
    "Nr_Medium":[3,1,1,4,3],
    "Nr_Large":[2,0,0,0,0],
    "PeriodBetween2Visits" : [10,0,12,3,1],
    "NumberOfVisits" : [2,2,3,3,3]

}

dt = pd.DataFrame(dt)

i need to keep groupby patient_ID, then keep the facility,patient_ID,NumberOfVisits, but Maximum and minimum of PeriodBetween2Visits.

Here is what i tried:

dt = dt.groupby(['patient_ID'],as_index=False)["facility","patient_ID","PeriodBetween2Visits","NumberOfVisits"].agg({'PeriodBetween2Visits': ['min', 'max']})


dt.head()

But, it is not what i need!

A proper output for me as below:

enter image description here

Jeff
  • 7,767
  • 28
  • 85
  • 138

1 Answers1

2

I am using named aggregation here which was inbuilt in groupby and agg recently:

 dt.groupby(['facility','patient_ID']).agg(
                 Min_PeriodBetween2Visits=('PeriodBetween2Visits','min'),
                 Max_PeriodBetween2Visits=('PeriodBetween2Visits','max'),
                 NumberOfVisits=('NumberOfVisits','nunique')).reset_index()

    facility  patient_ID  Min_PeriodBetween2Visits  Max_PeriodBetween2Visits  \
0  Ann Arbor        4388                         0                        10   
1    Detriot        9086                         1                        12   

   NumberOfVisits  
0               2  
1               3  
anky
  • 74,114
  • 11
  • 41
  • 70
  • @Jeff might be a version issue. Is it possible to update pandas version? Which one are you currently using? `pd.__version__` – anky Jan 12 '20 at 16:24
  • @Jeff I am using version `0.25+` , if possible please [`upgrade`](https://stackoverflow.com/questions/37954195/upgrade-version-of-pandas) the version and try – anky Jan 12 '20 at 16:37
  • @Jeff what exactly do you want to aggregate for that column, how do you get 2 and 3 in the example? – anky Jan 12 '20 at 17:13
  • @Jeff I understand that but since we are reducing the rows we have to have some logic for rest of the colums, example either first row , max row avg, etc, you cant have all rows right? – anky Jan 12 '20 at 17:16
  • Yes, we can consider first row value for `NumberOfVisits` – Jeff Jan 12 '20 at 17:18
  • @Jeff then just use `NumberOfVisits=('NumberOfVisits','first')` instead of `NumberOfVisits=('NumberOfVisits','nunique')` for the 3rd aggregation – anky Jan 12 '20 at 17:21