1

I'm trying to subset a grouped data set. Although there are many questions (e.g. Select the first and last row by group in a data frame) on this subject, none of them fit into my case described here. An example data is

df<-data.frame("id"=c(1,1,1,2,2,2,3,3,3),
           "x1"=c(NA,1,1,1,1,1,0,0,1),
           "x2"=c(10,8,13,4,7,6,9,10,6))

I want to retain data for cases where "x1" is first seen to be equal to 1 for each id. I expect to have

df<-data.frame("id"=c(1,2,3),
           "x1"=c(1,1,1),
           "x2"=c(8,4,6))

I tried

df<-df %>% 
group_by(id) %>% 
filter(first(x1)==1)

but it provides undesired output. Any help on this is greatly appreciated.

Sotos
  • 51,121
  • 6
  • 32
  • 66
T Richard
  • 525
  • 2
  • 9

2 Answers2

2

You can first filter the dataframe according to your condition and then use the slice function to select the first row for each group.

df %>% 
  group_by(id) %>% 
  filter(x1 == 1) %>% 
  slice(1)

# A tibble: 3 x 3
# Groups:   id [3]
#      id    x1    x2
#   <dbl> <dbl> <dbl>
# 1     1     1     8
# 2     2     1     4
# 3     3     1     6
Ric S
  • 9,073
  • 3
  • 25
  • 51
1

A base R option using subset + ave

subset(
  df[complete.cases(df), ],
  ave(x1 == 1, id, FUN = function(x) min(which(x)) == seq_along(x))
)

giving

  id x1 x2
1  1  1  8
2  2  1  4
3  3  1  6
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81