Python: obtaining the first observation according to its date

Asked Jul 25 '22 at 13:37

Active Jul 25 '22 at 13:48

Viewed 9 times

I have the following dataframe

id code code_date   medication medication_date
1  A     2017-05-18 Y          2017-05-18
1  A     2017-05-25 V          2017-05-18
1  Y     2017-07-18 D          2017-05-18
2  C     2017-08-18 C          2017-05-18
2  C     2017-09-18 Y          2017-05-18
2  Y     2017-03-18 O          2017-05-18

I would like to select the rows where where the earliest code takes place. In the above example, for patient 1 we have code A repeated while for patient 2 C is repeated. I would like to remove those rows of the repeated code and the latter date. Also, note that I do not care about the medication or medication_date but they should be in the new dataframe:

id code code_date   medication medication_date
1  A     2017-05-18 Y          2017-05-18
1  Y     2017-07-18 D          2017-05-18
2  C     2017-08-18 C          2017-05-18
2  Y     2017-03-18 O          2017-05-18

So far I have tried:

df.groupby(["id", "code", "code_date"]).nth(0).reset_index()

But I don't get the right answer. Any suggestions are more than welcome.

edited Jul 25 '22 at 13:48

asked Jul 25 '22 at 13:37

Economist_Ayahuasca

1,648
24
33

1

`2017-17-18` doesn't seem to be a valid date ;) Assuming all dates are valid: `out = df.loc[pd.to_datetime(df['code_date']).groupby([df['id'], df['code']]).idxmin()]` – mozway Jul 25 '22 at 13:42
sorry my bad, I just re-edited the question. Thanks – Economist_Ayahuasca Jul 25 '22 at 13:48

Python: obtaining the first observation according to its date

0 Answers0