2

I have a time series dataset with 1000 columns. Each row, is of course, a different record. There are some NA values that are scattered throughout the dataset.

I would like to replace each NA with either the adjacent left-value or the adjacent-right value, it doesn't matter which.

A neat solution and one which I was going for is to replace each NA with the value to its right, unless it is in the last column, in which case replace it with the value to its left.


I was just going to do a for loop, but I assume a function would be more efficient. Essentially, I wasn't sure how to reference the adjacent values.

Here is what I was trying:

for (entry in dataset) {
  if (any(is.na(entry)) == TRUE && entry[,1:999]) {
    entry = entry[,1]
  }
  else if (any(is.na(entry)) == TRUE && entry[,1000]) {
    entry = cell[,-1]
  }
}

As you can tell, I'm inexperienced with R :) Not really sure how you index the values to the left or right.

WΔ_
  • 1,229
  • 4
  • 20
  • 34

1 Answers1

3

I would suggest using na.locf on the transposed of your dataset.

The na.locf function of the zoo package is designed to replace NA by the closest value (+1 or -1 in the same row). Since you want the columns, we can just transpose first the dataset:

library(zoo)
df=matrix(c(1,3,4,10,NA,52,NA, 11, 100), ncol=3)
step1 <-  t(na.locf(t(df), fromLast=T))
step2 <-  t(na.locf(t(step1), fromLast=F))
print(df)
#### [1,]    1   10   NA
#### [2,]    3   NA   11
#### [3,]    4   52  100
print(step2)
#### [1,]    1   10   10
#### [2,]    3   11   11
#### [3,]    4   52  100

I do it in 2 steps since there is a different treatment for inside columns and last column. If you know the dplyr package it's even more straightforward to turn it into a function:

library(dplyr)
MyReplace = function(data) {data %>% t %>% na.locf(.,,T) %>% na.locf %>% t}
MyReplace(df)
agenis
  • 8,069
  • 5
  • 53
  • 102