0

Each PatientID must have one record that is "Admit" AND one record that is "Discharge", nothing more nothing less.

In this table, for example, PatientID 152096 needs to go.

PatientID EventType
25173 Admit
25173 Discharge
25174 Admit
25174 Discharge
152096 Admit
152096 Admit

I have got to this point by using

dfGrouped.groupby('PatientID').filter(lambda x: len(x) == 2)

I'm wondering if I should combine the PatientID into a single row first then check, or just check at this point.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Your table is better than an image, however please check [How to make pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples), that helps others helping you. – MagnusO_O Oct 08 '22 at 16:46

1 Answers1

0

If you really want to filter by the two specific values:

m = df.groupby('PatientID')['EventType'].agg(set).eq({'Admit', 'Discharge'})

out = df[df['PatientID'].isin(m[m].index)]

If a number of 2 different Events is fine:

df.groupby('PatientID')['EventType'].nunique().eq(2)

out = df[df['PatientID'].isin(m[m].index)]

or:

out = df[df.groupby('PatientID')['EventType'].transform('nunique').eq(2)]

output:

   PatientID  EventType
0      25173      Admit
1      25173  Discharge
2      25174      Admit
3      25174  Discharge
mozway
  • 194,879
  • 13
  • 39
  • 75