0

I have data with rows "username" and then columns with each day from 2016-09-06 to 2017-09-30 where each day is a column.

I have values L, T and C for each user on the days the observation occurred. For the dates with no observation I have NA.

I want to replace NA's with L, but only AFTER a user has had a first observation. So:

NA NA NA NA L NA NA L T C would become NA NA NA NA L L L L TC

I have a small subset from my data but do not know how to insert this into the question. If needed, please let me know how I can provide this as well.

Thanks in advance.

  • 1
    Try `library(zoo); na.locf(vec, na.rm = FALSE)#[1] NA NA NA NA "L" "L" "L" "L" "T" "C"` – akrun Dec 16 '17 at 14:38

2 Answers2

1

na.locf0(x) will fill in the NA values with the last occurring value in x while leaving leading NA values in place so that its output is the same length as its input; thus, if a position in na.locf(x) is not NA but that same position is NA in x then na.locf0 would have filled it in. Those positions have the value TRUE in the logical expression shown in the code below so set the values of x at those positions to "L". We use replace to do that non-destructively (i.e. we output the desired vector without modifying x itself).

library(zoo)

x <- c(NA, NA, NA, NA, "L", NA, NA, "L", "T", "C") # test data
replace(x, !is.na(na.locf0(x)) & is.na(x), "L")
## [1] NA  NA  NA  NA  "L" "L" "L" "L" "T" "C"

Note

If we knew that the NAs to be filled in all follow L (as in the sample data in the question) then

na.locf0(x)

would be sufficient; however, if the general case is as described in the question then the replace code above will be needed.

Variation

A variation of the above is to replace all NA values with "L" and then in that replace those positions that are NA in na.locf0(x) with NA.

replace(replace(x, is.na(x), "L"), is.na(na.locf0(x)), NA)
## [1] NA  NA  NA  NA  "L" "L" "L" "L" "T" "C"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

We can just do

library(zoo)
na.locf(vec, na.rm = FALSE)
#[1] NA  NA  NA  NA  "L" "L" "L" "L" "T" "C"

data

vec <- c(NA, NA, NA, NA, 'L', NA, NA, 'L', 'T', 'C') 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • thanks! this works, but unfortunately only when the first observation in the vector is "L". This can also be "T" or "C", but then if I use this code it fills all NA's after the first observation with either "T" or "C". Is there a way to specify that NA's should be filled with "L"? – Anouk Maaskant Dec 18 '17 at 13:21