2

My data frame looks like this:

id        year        value
1         2000        23
1         2001        40
1         2003        93
2         1998        90
2         1999        91
2         2002        92
3         2015        12
3         2016        13
3         2017        14

I want to remove the ID if there are any two consecutive values that do not meet the threshold of 90. Note: Consecutive in this case, just means 1 year after another year, does not have to be exactly 1 year after. (Example: 2001 and 2003 for ID 1 are consecutive years)

The output should be just id 2. If id 2 had any instances where two consecutive values were <90, they would also be removed.

id        year        value
2         1998        90
2         1999        91
2         2002        92
neilfws
  • 32,751
  • 5
  • 50
  • 63
Theresa
  • 65
  • 5
  • 1
    Welcome to SO! What have you tried so far? Where are you getting stuck? Please revise your question to fit the standard for a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – OTStats Feb 07 '19 at 20:58

4 Answers4

3

Could also do:

library(dplyr)

df %>%
  group_by(id) %>%
  filter(!any(value < 90 & lag(value) < 90))

Output:

# A tibble: 3 x 3
# Groups:   id [1]
     id  year value
  <int> <int> <int>
1     2  1998    90
2     2  1999    91
3     2  2002    92
arg0naut91
  • 14,574
  • 2
  • 17
  • 38
  • Grouping turns it into a `tibble` which wants to display prettier results, you can just do `%>% as.data.frame()` at the end. – arg0naut91 Feb 07 '19 at 21:28
2

This solution uses package dplyr.

library(dplyr)

df1 %>%
  group_by(id) %>%
  filter(all(value[-1] >= 90 | value[-n()] >= 90))
## A tibble: 3 x 3
## Groups:   id [1]
#     id  year value
#  <int> <int> <int>
#1     2  1998    90
#2     2  1999    91
#3     2  2002    92

Data.

df1 <- read.table(text = "
id        year        value
1         2000        23
1         2001        40
1         2003        93
2         1998        90
2         1999        91
2         2002        92
3         2015        12
3         2016        13
3         2017        14                  
", header = TRUE)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1
    @arg0naut I remove the first element of the vector and the last, `n()`. So the two resulting vectors are lagged and OR'ed `value[2]` and `value[1]`, then 3 and 2, etc. – Rui Barradas Feb 07 '19 at 22:17
0

Using dplyr you can first identify the values that are smaller than 90. Then you can count how many entries in sequence are smaller than 90. After that you can keep just the ids in which you do not observe 2 consecutive values smaller than 90.

library(dplyr)
df %>%
  mutate(value_90 = value < 90) %>%
  group_by(id) %>%
  mutate(n_cons = cumsum(value_90)) %>%
  filter(!any(n_cons == 2)) %>%
  select(id, year, value)

# A tibble: 3 x 3
# Groups:   id [1]
     id  year value
  <dbl> <dbl> <dbl>
1     2  1998    90
2     2  1999    91
3     2  2002    92
Douglas Mesquita
  • 1,011
  • 7
  • 12
0

Using dplyr and rle...

library(dplyr)
DT %>% mutate(test = value < 90) %>% group_by(id) %>% filter(
  with(rle(test), !any(lengths >= 2 & values))
) %>% select(-test)

# A tibble: 3 x 3
# Groups:   id [1]
     id  year value
  <int> <int> <int>
1     2  1998    90
2     2  1999    91
3     2  2002    92
Frank
  • 66,179
  • 8
  • 96
  • 180