-2

Hello all my df looks like

PID Stage
123  1
123  2
123  4
124  1
124  3
137  2
137  3
153  1
153  4
153  5
167  4
167  5
178  1
178  2
178  1
187  3
187  4 

I want to delet record based on rows which are >= 4 Stage

Expected output

PID Stage
124  1
124  3
137  2
137  3
178  1
178  2
178  1

Thanks in advance

Rebel_47
  • 69
  • 4

2 Answers2

0

Select groups where all the values are less than 4.

library(dplyr)
df %>% group_by(PID) %>%filter(all(Stage < 4))

#    PID Stage
#  <int> <int>
#1   124     1
#2   124     3
#3   137     2
#4   137     3
#5   178     1
#6   178     2
#7   178     1

This can be written in data.table

library(data.table)
setDT(df)[, .SD[all(Stage < 4)], PID]

and base R :

subset(df, ave(Stage < 4, PID, FUN = all))

data

df <- structure(list(PID = c(123L, 123L, 123L, 124L, 124L, 137L, 137L, 
153L, 153L, 153L, 167L, 167L, 178L, 178L, 178L, 187L, 187L), 
    Stage = c(1L, 2L, 4L, 1L, 3L, 2L, 3L, 1L, 4L, 5L, 4L, 5L, 
    1L, 2L, 1L, 3L, 4L)), class = "data.frame", row.names = c(NA, -17L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

A dplyr solution without group_by():

library(dplyr)

df %>% filter(!PID %in% PID[Stage >= 4])

#   PID Stage
# 1 124     1
# 2 124     3
# 3 137     2
# 4 137     3
# 5 178     1
# 6 178     2
# 7 178     1

It's base version:

subset(df, !PID %in% PID[Stage >= 4])
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51