I have the following dataframe
id code code_date medication medication_date
1 A 2017-05-18 Y 2017-05-18
1 A 2017-05-25 V 2017-05-18
1 Y 2017-07-18 D 2017-05-18
2 C 2017-08-18 C 2017-05-18
2 C 2017-09-18 Y 2017-05-18
2 Y 2017-03-18 O 2017-05-18
I would like to select the rows where where the earliest code takes place. In the above example, for patient 1 we have code A repeated while for patient 2 C is repeated. I would like to remove those rows of the repeated code and the latter date. Also, note that I do not care about the medication or medication_date but they should be in the new dataframe:
id code code_date medication medication_date
1 A 2017-05-18 Y 2017-05-18
1 Y 2017-07-18 D 2017-05-18
2 C 2017-08-18 C 2017-05-18
2 Y 2017-03-18 O 2017-05-18
So far I have tried:
df.groupby(["id", "code", "code_date"]).nth(0).reset_index()
But I don't get the right answer. Any suggestions are more than welcome.