2

I have a dataset like this:

      id    type    value
1    001     0      1991
2    001     0      1992
3    001     1      1993
4    001     1      1994
5    002     1      1992
6    002     1      1993
7    003     0      1999
8    003     1      2000
9    003     0      2001

And I want to choose the rows on my dataset after first with type equal to 1.

The final expected result should be as follows:

      id    type    value


3    001     1      1993
4    001     1      1994
5    002     1      1992
6    002     1      1993

8    003     1      2000
9    003     0      2001

I know that groups it by id first. But I have no idea to do next step.

Does anyone have any suggestions?

h3rm4n
  • 4,126
  • 15
  • 21
money
  • 39
  • 4

2 Answers2

2

With dplyr:

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(sel = cumsum(type)) %>% 
  filter(sel > 0) %>% 
  select(id, type, value)

The result:

# A tibble: 6 x 3
# Groups:   id [3]
     id  type value
  <int> <int> <int>
1     1     1  1993
2     1     1  1994
3     2     1  1992
4     2     1  1993
5     3     1  2000
6     3     0  2001

With base R:

df[with(df, ave(type, id, FUN = cumsum)) > 0, ]
h3rm4n
  • 4,126
  • 15
  • 21
1

You could subset your data for values where the cumsum per group of id is equal or greater 1 (or greater 0 of course).

In base R

idx <- as.logical(with(DF, ave(type, id, FUN = function(x) cumsum(x) >= 1)))
DF[idx, ]
#  id type value
#3  1    1  1993
#4  1    1  1994
#5  2    1  1992
#6  2    1  1993
#8  3    1  2000
#9  3    0  2001

With data.table (see this post)

library(data.table)
setDT(DF)[DF[, .I[cumsum(type) > 0], by = id]$V1]

data

DF <- structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), type = c(0L, 
0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L), value = c(1991L, 1992L, 1993L, 
1994L, 1992L, 1993L, 1999L, 2000L, 2001L)), .Names = c("id", 
"type", "value"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8", "9"))
markus
  • 25,843
  • 5
  • 39
  • 58