5

I want to interpolate multiple NA values in a matrix called, tester.

This is a part of tester with only 1 column of NA values, in the whole 744x6 matrix other columns have multiple as well:

ZONEID   TIMESTAMP         U10            V10            U100          V100
1        20121022 12:00    -1.324032e+00  -2.017107e+00 -3.278166e+00  -5.880225574
1        20121022 13:00    -1.295168e+00            NA  -3.130429e+00  -6.414975148
1        20121022 14:00    -1.285004e+00            NA  -3.068829e+00  -7.101699541
1        20121022 15:00    -9.605904e-01            NA  -2.332645e+00  -7.478168285
1        20121022 16:00    -6.268261e-01 -3.057278e+00  -1.440209e+00  -8.026791079

I have installed the zoo package and used the code library(zoo). I have tried to use the na.approx function, but it returns on a linear basis:

na.approx(tester)
# Error ----> need at least two non-NA values to interpolate

na.approx(tester, rule = 2)
# Error ----> need at least two non-NA values to interpolate

na.approx(tester, x = index(tester), na.rm = TRUE, maxgap = Inf)

Afterward I tried:

Lines <- "tester"
library(zoo) 
z <- read.zoo(textConnection(Lines), index = 2)[,2] 
na.approx(z)

Again I got the same multiple NA values error. I also tried:

z <- zoo(tester)
index(Cz) <- Cz[,1]
Cz_approx <- na.approx(Cz)

Same error.

I must be doing something really stupid, but I would really appreciate your help.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Tjeerd Luykx
  • 51
  • 1
  • 2
  • 2 values form a regression line, and allow for interpolation. It is a mathematical certainty that extrapolation with less than 2 real values is impossible. You might as well just use a random number generator. Check if you don't have a variable in your matrix that consists of only NA values. And check http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example to see how you can provide us with code and test objects we can work with. Welcome to SO! – Joris Meys Sep 02 '14 at 14:15
  • Hi Joris, thank you very much for your help. You are absolutely right that the linear interpolation is mathematically not justified. But I am not sure how a random number generator would help my problem. How could I program the code that every multiple NA value, is between the previous and the next number or at least equal to the previous or next number. In the case of the V10 variable between, -2.017 and -3.057. Are you able to help me with this.Thanks very much. Also thanks for your editing, I hope I did not terribly screw up and read your advice for the next time. – Tjeerd Luykx Sep 02 '14 at 14:34
  • [Amelia](http://cran.r-project.org/web/packages/Amelia/index.html)? – Roland Sep 02 '14 at 14:36
  • I had the same issue so I lebieve that future readers may also come to this post. I guess the problem here is that when you convert a data.frame containing class Date objects it is merged into your class zooobject, therefore when na.approx tries to interpolate with non-numeric values. There is certainly a better way to do this, but I think one could specify when calling zoo() the parameters order.by to match your class Date object and remove any non-numeric collumn or specify numeric ones from your original data.frame. – dudu Aug 15 '16 at 20:52
  • In other words, avoid class Date objects in your zoo object. – dudu Aug 15 '16 at 20:52

1 Answers1

9

You may apply na.approx only on columns with at least two non-NA values. Here I use colSums on a boolean matrix to find relevant columns.

# create a small matrix
m <- matrix(data = c(NA, 1, 1, 1, 1,
                     NA, NA, 2, NA, NA,
                     NA, NA, NA, NA, 2,
                     NA, NA, NA, 2, 3),
            ncol = 5, byrow = TRUE)

m
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   NA    1    1    1    1
# [2,]   NA   NA    2   NA   NA
# [3,]   NA   NA   NA   NA    2
# [4,]   NA   NA   NA    2    3

library(zoo)

# na.approx on the entire matrix does not work
na.approx(m)
# Error in approx(x[!na], y[!na], xout, ...) : 
#   need at least two non-NA values to interpolate

# find columns with at least two non-NA values
idx <- colSums(!is.na(m)) > 1
idx
# [1] FALSE FALSE  TRUE  TRUE  TRUE

# interpolate 'TRUE columns' only
m[ , idx] <- na.approx(m[ , idx])
m
#      [,1] [,2] [,3]     [,4] [,5]
# [1,]   NA    1    1 1.000000  1.0
# [2,]   NA   NA    2 1.333333  1.5
# [3,]   NA   NA   NA 1.666667  2.0
# [4,]   NA   NA   NA 2.000000  3.0
Henrik
  • 65,555
  • 14
  • 143
  • 159