2

This is my data frame:

library(zoo)
library(dplyr)

df <- data.frame(
  id = rep(1:4, each = 4), 
  status = c(
    NA, "a", "c", "a", 
    NA, "c", "c", "c",
    NA, NA, "a", "c",
    NA, NA, "c", "c"),
  otherVar = letters[1:16],
  stringsAsFactors = FALSE)

For the variable status I want the next observation to be carried backward within group (id).

df %>% group_by(id) %>% na.locf(fromLast = TRUE) %>% ungroup

However, I want only my "c" 's to be carried backwards but not "a" 's.

From variable status:

NA "a" "c" "a" NA "c" "c" "c" NA NA "a" "c" NA NA "c" "c"

I want to get:

NA "a" "c" "a" "c" "c" "c" "c" NA NA "a" "c" "c" "c" "c" "c"

Respectively:

data.frame(
  id = rep(1:4, each = 4), 
  status = c(
    NA, "a", "c", "a", 
    "c", "c", "c", "c",
    NA, NA, "a", "c",
    "c", "c", "c", "c"),
  otherVar = letters[1:16],
  stringsAsFactors = FALSE)

Is there a way of doing this?

Frank
  • 66,179
  • 8
  • 96
  • 180
Roccer
  • 899
  • 2
  • 10
  • 25

2 Answers2

3

After applying na.locf0 check each position that was NA and if it is now a reset it back to NA. If you want to overwrite status then replace the second status2= line with status = if_else(is.na(status) & status2 == "a", NA_character_, status2), status2 = NULL) %>%

library(dplyr)
library(zoo)

df %>% 
  group_by(id) %>% 
  mutate(status2 = na.locf0(status, fromLast = TRUE),
         status2 = if_else(is.na(status) & status2 == "a", NA_character_, status2)) %>%
  ungroup

giving:

# A tibble: 16 x 4
      id status otherVar status2
   <int> <chr>  <chr>    <chr>  
 1     1 <NA>   a        <NA>   
 2     1 a      b        a      
 3     1 c      c        c      
 4     1 a      d        a      
 5     2 <NA>   e        c      
 6     2 c      f        c      
 7     2 c      g        c      
 8     2 c      h        c      
 9     3 <NA>   i        <NA>   
10     3 <NA>   j        <NA>   
11     3 a      k        a      
12     3 c      l        c      
13     4 <NA>   m        c      
14     4 <NA>   n        c      
15     4 c      o        c      
16     4 c      p        c      
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • 1
    I guess with na.locf inside mutate, it's safer to have na.rm=FALSE to avoid getting shorter output than input. I mean `na.locf(c(NA, "c", NA), fromLast=TRUE)` – Frank May 07 '18 at 16:31
  • 2
    That situation did not occur in the sample input so I assumed it does not occur in the data but now that you have pointed it out I have changed it to `na.locf0` just in case. – G. Grothendieck May 07 '18 at 16:32
1

A solution using tidyr:fill be based on creating a dummyStatus column. fill the dummyStatus using .direction = "up". Now use this dummyStatus to populate NA values in actual status column after verifying the check that following value should be c.

library(dplyr)
library(tidyr)
df %>% group_by(id) %>%
    mutate(dummyStatus = status) %>%
    fill(dummyStatus, .direction = "up" ) %>%
    mutate(status = ifelse(is.na(status) & lead(dummyStatus)=="c","c",status)) %>%
    select(-dummyStatus) %>% as.data.frame()

  #    id status otherVar
  # 1   1   <NA>        a
  # 2   1      a        b
  # 3   1      c        c
  # 4   1      a        d
  # 5   2      c        e
  # 6   2      c        f
  # 7   2      c        g
  # 8   2      c        h
  # 9   3   <NA>        i
  # 10  3   <NA>        j
  # 11  3      a        k
  # 12  3      c        l
  # 13  4      c        m
  # 14  4      c        n
  # 15  4      c        o
  # 16  4      c        p

Data:

df <- data.frame(
  id = rep(1:4, each = 4), 
  status = c(
    NA, "a", "c", "a", 
    NA, "c", "c", "c",
    NA, NA, "a", "c",
    NA, NA, "c", "c"),
  otherVar = letters[1:16],
  stringsAsFactors = FALSE)
MKR
  • 19,739
  • 4
  • 23
  • 33