2

I have the below DataFrame, I want to select the Services where there are less than 2 'Healthy' instances of that service. In this case I want the Series (EmailService, UserService, NotificationService)

              CPU              Service  Memory   Status
IP                                                     
10.22.11.150   13       StorageService      55  Healthy
10.22.11.90    23       StorageService      19  Healthy
10.22.11.91    10         EmailService      44  Healthy
10.22.11.92    69          UserService       1  Healthy
10.22.11.93    63  NotificationService      81  Healthy
10.22.11.93    87  NotificationService      98  Unhealthy

I think I need this grouping,

grouped = servers_df.groupby('Service')

but not sure how to count the Status column and then get the result based on that.

The Cat
  • 2,375
  • 6
  • 25
  • 37

3 Answers3

3

Use transform with lambda function for count Healthy and compare, last filter by boolean indexing:

df = df[df.groupby('Service')['Status'].transform(lambda x: (x=='Healthy').sum() < 2)]
print (df)
             CPU              Service  Memory     Status
IP                                                      
10.22.11.91   10         EmailService      44    Healthy
10.22.11.92   69          UserService       1    Healthy
10.22.11.93   63  NotificationService      81    Healthy
10.22.11.93   87  NotificationService      98  Unhealthy

If want check only 1 value Healthy per group use duplicated with keep=False for all dupes and chain it with condition for compare Healthy for filter out multiple Unhealthy, then invert condition by ~ and filter boolean indexing again:

df = df[~(df.duplicated(['Service','Status'], keep=False) & (df['Status'] == 'Healthy'))]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can also use filter.

df.groupby("Service").filter(lambda x: len(x[x.Status == "Healthy"]) < 2)

Might be slower in speed according to jezrael's experiment in this answer

Another way: using apply (modified from jezrael's transform solution)

df.groupby('Service').apply(
                   lambda x: x if (x.Status == 'Healthy').sum() < 2 else None)


                        IP         CPU  Service              Memory Status
Service                     
EmailService        2   10.22.11.91 10  EmailService         44 Healthy
NotificationService 4   10.22.11.93 63  NotificationService  81 Healthy
                    5   10.22.11.93 87  NotificationService  98 Unhealthy
UserService         3   10.22.11.92 69  UserService          1  Healthy
Tai
  • 7,684
  • 3
  • 29
  • 49
1

IIUC

s=df[df.Status=='Healthy'].groupby('Service').Service.count().lt(2)
df.loc[df.Service.isin(s[s].index)]

    IP          CPU Service             Memory  Status
2   10.22.11.91 10  EmailService        44      Healthy
3   10.22.11.92 69  UserService         1       Healthy
4   10.22.11.93 63  NotificationService 81      Healthy
5   10.22.11.93 87  NotificationService 98      Unhealthy
Tai
  • 7,684
  • 3
  • 29
  • 49
BENY
  • 317,841
  • 20
  • 164
  • 234