0

I have a vector with some NAs, and I want to replace some of those NAs with the previous non-NA value minus 0.1. I also don't want to replace NAs if the string of NAs is longer than a certain length (e.g., 2). Here's an example

x <- c(1:3, NA, 4, NA, NA, 5, NA, NA, NA, 6, NA)

I want to make a vector that looks like

x_prime <- c(1:3, 2.9, 4, 3.9, 3.8, 5, NA, NA, NA, 6, 5.9)

Printing this out looks like:

> x_prime
 [1] 1.0 2.0 3.0 2.9 4.0 3.9 3.8 5.0  NA  NA  NA 6.0  5.9

As an added complication, I want to keep track of the indices that I modified, so I also want a vector that looks like

idx <- c(4, 6, 7, 13)

If the first position in NA (and for all leading NAs), we can leave it and do nothing.

I have found some similar questions on SO like this, and I've tried similar functions to those presented there, but haven't had success. Any ideas? Thank you in advance.

mikey
  • 1,066
  • 9
  • 17
  • 1
    `idx` indicates that the last position should be replaced. But `x_prime` shows `NA` instead of `5.9`. Please clarifiy. - What should happen in case the first position is `NA` and there is no previous value? Please specify. – Jan Mar 05 '21 at 21:41
  • Why is the last `NA` not replaced with 5.9 ? It's length is shorter than 2. – Ronak Shah Mar 06 '21 at 03:39
  • I edited to fix this and explain that we don't need to change leading NAs – mikey Mar 07 '21 at 14:28

3 Answers3

3

Base R option with ave :

len <- 2
x1 <- ave(x, cumsum(!is.na(x)), FUN = function(v) {
  if(length(v) > len + 1) v 
  else v[1] - seq(0, by= 0.1, length.out = length(v))
  })

x1
#[1] 1.0 2.0 3.0 2.9 4.0 3.9 3.8 5.0  NA  NA  NA 6.0 5.9

We create groups of NA values together with the first non-NA value and use it in ave. If the group length is greater than len + 1 (+ 1 because the first value is not NA in each group) then we don't change anything in the group else we subtract 0, 0.1, 0.2 from the first value in the group.


To get the positions which are replaced find out the NA's in x which are not NA's in x1.

which(is.na(x) & !is.na(x1))
#[1]  4  6  7 13
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

Here is an option with diff and cumsum to split

library(zoo)
lst1 <- split(x, cumsum(c( diff(!is.na(x)) < 0, TRUE)))
unname(unlist(lapply(lst1, function(x) if(sum(is.na(x)) <= 2) 
      na.locf0(x) -seq(0, length.out = length(x), by = 0.1) else x)))
#[1] 1.0 1.9 3.0 2.9 4.0 3.9 3.8 5.0  NA  NA  NA 6.0  NA

For the second case

as.vector(unlist(sapply(split(seq_along(x) * is.na(x), 
     cumsum(c( diff(!is.na(x)) < 0, TRUE))), 
         function(x)  x[x != 0 & sum(x != 0) <=2])))
#[1]  4  6  7 13
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Version using package imputeTS using na_locf with parameter maxgap:

library("imputeTS")
x_prime <- na_locf(x, maxgap = 2)
idx <- which(is.na(imp) != is.na(x))
x_prime[idx] <- x_prime[idx] - 0.1

Results:

x_prime
[1] 1.0 2.0 3.0 2.9 4.0 3.9 3.9 5.0  NA  NA  NA 6.0 5.9

idx
[1]  4  6  7 13

edit: Short addition, seems like I interpreted "replace NAs with the previous non-NA value minus 0.1" differently. Not sure if it is on purpose, but you seem to prefer that the minus 0.1 is also done, when the value before that is carried forward was an imputed value.

Steffen Moritz
  • 7,277
  • 11
  • 36
  • 55