1

In the following data, for each id, I would like to remove rows after the first 1 is reached. My data is as follows:

 id x
  a 0
  a 0
  a 1
  a 0
  a 1
  b 0
  b 1
  b 1
  b 0

The desired output:

 id x
  a 0
  a 0
  a 1
  b 0
  b 1

Code to reproduce data:

df <- structure(list(id = c("a", "a", "a", "a", "a", "b", "b", "b", 
"b"), x = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-9L))

This is not a duplicate question, although similar to this. I used cumall() to remove all rows after the first 1:

df <- df%>%
  group_by(id) %>%
  filter(cumall(!(x == 1))) %>%
  ungroup()

But the caveat here is that I want to include the row with the first 1 as well. Any help is appreciated, preferably using dplyr!

Mark
  • 7,785
  • 2
  • 14
  • 34
Cloft X
  • 141
  • 7

2 Answers2

2
df %>% 
    group_by(id) %>%
    mutate(y = cumall(lag(!x))) %>%
    filter(is.na(y)) %>%
    select(-y)

  id        x
  <chr> <int>
1 a         0
2 a         0
3 a         1
4 b         0
5 b         1
Mark
  • 7,785
  • 2
  • 14
  • 34
  • Using this code unfortunately gives me the same output that I got in my attempt i.e., I get a dataframe where all rows after and including the first `1` are removed. – Cloft X Jun 28 '23 at 07:55
  • did you run `df <- structure(list(id = c("a", "a", "a", "a", "a", "b", "b", "b", "b"), x = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 0L)), class = "data.frame", row.names = c(NA, -9L))`? – Mark Jun 28 '23 at 07:56
  • Understood, your code works! – Cloft X Jun 28 '23 at 08:02
1

a data.table approach

library(data.table)
setDT(df)[df[, .I[seq(1, min(which(x == 1)))], .(id)]$V1,]

of

setDT(df)[df[, .I[seq.int(min(which(x == 1)))], .(id)]$V1,]
#    id x
# 1:  a 0
# 2:  a 0
# 3:  a 1
# 4:  b 0
# 5:  b 1
Wimpel
  • 26,031
  • 1
  • 20
  • 37