0

I have a huge data and my python pandas dataframe looks like this:

HR SBP DBP SepsisLabel PatientID
92 120 80 0 0
98 115 85 0 0
93 125 75 0 0
95 130 90 0 1
102 120 80 1 1
109 115 75 1 1
94 135 100 0 2
97 100 70 0 2
85 120 80 0 2
88 115 75 0 3
93 125 85 1 3
78 130 90 1 3
115 140 110 0 4
102 120 80 0 4
98 140 110 0 4

I want to select only those rows based on PatientID which have SepsisLabel = 1. Like PatientID 0, 2, and 4 don't have sepsis label 1. So, I don't want them in new dataframe. I want PatientID 1 and 3, which have SepsisLabel = 1 in them.

I hope you can understand what I want to say. If so, please help me with a python code. I am sure it needs some condition along with iloc() function (I might be wrong).

Regards.

Huzaifa Arshad
  • 143
  • 3
  • 14
  • 4
    Does this answer your question? [How to select rows from a DataFrame based on column values](https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values) – George Sotiropoulos May 04 '21 at 07:53

1 Answers1

1

Use GroupBy.transform with GroupBy.any for test if at least one True per groups and filtering by boolean indexing:

df1 = df[df['SepsisLabel'].eq(1).groupby(df['PatientID']).transform('any')]

Or filter all groups with 1 and filter them in Series.isin:

df1 = df[df['PatientID'].isin(df.loc[df['SepsisLabel'].eq(1), 'PatientID'])]

If small data or performance not important is possible use DataFrameGroupBy.filter:

df1 = df.groupby('PatientID').filter(lambda x: x['SepsisLabel'].eq(1).any())

print (df1)
     HR  SBP  DBP  SepsisLabel  PatientID
3    95  130   90            0          1
4   102  120   80            1          1
5   109  115   75            1          1
9    88  115   75            0          3
10   93  125   85            1          3
11   78  130   90            1          3
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252