Subset grouped data

Question

I'm trying to subset a grouped data set. Although there are many questions (e.g. Select the first and last row by group in a data frame) on this subject, none of them fit into my case described here. An example data is

df<-data.frame("id"=c(1,1,1,2,2,2,3,3,3),
           "x1"=c(NA,1,1,1,1,1,0,0,1),
           "x2"=c(10,8,13,4,7,6,9,10,6))

I want to retain data for cases where "x1" is first seen to be equal to 1 for each id. I expect to have

df<-data.frame("id"=c(1,2,3),
           "x1"=c(1,1,1),
           "x2"=c(8,4,6))

I tried

df<-df %>% 
group_by(id) %>% 
filter(first(x1)==1)

but it provides undesired output. Any help on this is greatly appreciated.

score 2 · Accepted Answer · answered Aug 31 '20 at 09:13

You can first filter the dataframe according to your condition and then use the slice function to select the first row for each group.

df %>% 
  group_by(id) %>% 
  filter(x1 == 1) %>% 
  slice(1)

# A tibble: 3 x 3
# Groups:   id [3]
#      id    x1    x2
#   <dbl> <dbl> <dbl>
# 1     1     1     8
# 2     2     1     4
# 3     3     1     6

score 1 · Answer 2 · answered Aug 31 '20 at 09:24

1

A base R option using subset + ave

subset(
  df[complete.cases(df), ],
  ave(x1 == 1, id, FUN = function(x) min(which(x)) == seq_along(x))
)

giving

answered Aug 31 '20 at 09:24

ThomasIsCoding

96,636
9
24
81

Subset grouped data

2 Answers2