0

My data has multiple observations for each ID. At the ID level, I want to convert all values to the most recent non-missing value. I have tried using mutate, group_by(id) and which.max(year) unsuccessfully.

Data:

data <- data.frame(
  id=c(1,1,2,2,3,3,4,4,5,5),
  year=rep(c(2010, 2011), 5),
  employ=c("yes", "yes", "no", "yes", "yes", "no", NA, "yes", "no", NA))

> data
   id year employ
1   1 2010    yes
2   1 2011    yes
3   2 2010     no
4   2 2011    yes
5   3 2010    yes
6   3 2011     no
7   4 2010   <NA>
8   4 2011    yes
9   5 2010     no
10  5 2011   <NA>

Desired output:

data2 <- data.frame(
  id=c(1,1,2,2,3,3,4,4,5,5),
  year=c(2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2010, 2010),
  employ=c("yes", "yes", "yes", "yes", "no", "no","yes", "yes","no", "no"))

> data2
   id year employ
1   1 2011    yes
2   1 2011    yes
3   2 2011    yes
4   2 2011    yes
5   3 2011     no
6   3 2011     no
7   4 2011    yes
8   4 2011    yes
9   5 2010     no
10  5 2010     no
EML
  • 615
  • 4
  • 14
  • `data %>% group_by(id, year) %>% tidyr::fill(employ, .direction = "updown") %>% ungroup()` – Jon Spring Feb 19 '21 at 23:14
  • The links supplied do not offer a complete solution to the question. I am trying to replace all values (missing and non-missing) by values for the most recent available year. – EML Feb 21 '21 at 19:51
  • Oops, meant `data %>% group_by(id) %>% tidyr::fill(employ, .direction = "updown") %>% ungroup()` – Jon Spring Feb 21 '21 at 22:16

1 Answers1

2

A data.table option

setDT(data)[, employ := last(na.omit(employ[order(year)])), id]

gives

    id year employ
 1:  1 2010    yes
 2:  1 2011    yes
 3:  2 2010    yes
 4:  2 2011    yes
 5:  3 2010     no
 6:  3 2011     no
 7:  4 2010    yes
 8:  4 2011    yes
 9:  5 2010     no
10:  5 2011     no

A dplyr way might be

data %>%
  group_by(id) %>%
  mutate(employ = last(na.omit(employ[order(year)])))

which gives

      id  year employ
   <dbl> <dbl> <chr>
 1     1  2010 yes
 2     1  2011 yes
 3     2  2010 yes
 4     2  2011 yes
 5     3  2010 no
 6     3  2011 no
 7     4  2010 yes
 8     4  2011 yes
 9     5  2010 no
10     5  2011 no
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81