1

Think of a picture of Sunrise where a red circle is surrounded by yellow thick ring and then blue background. Take red as 3 then yellow as 2 and blue as 1.

 11111111111
 11111211111
 11112221111
 11222322211
 22223332222
 11222322221
 11112221111
 11111211111

This is the desired output. But, the record/file/data has missing values (30% of all elements are missing).

How can we impute missing values so as to get this desired output keeping the circular trend in mind.

Navin Manaswi
  • 964
  • 7
  • 19
  • How big is the data? Is it only one circle as in this post or there are other circles or other shapes as well? – zx8754 May 08 '15 at 18:56

2 Answers2

13

This is how I would solve a problem of this sort in a very simple, straightforward way. Please note that I corrected your sample data above to be symmetric:

d <- read.csv(header=F, stringsAsFactors=F, text="
1,1,1,1,1,1,1,1,1,1,1
1,1,1,1,1,2,1,1,1,1,1
1,1,1,1,2,2,2,1,1,1,1
1,1,2,2,2,3,2,2,2,1,1
2,2,2,2,3,3,3,2,2,2,2
1,1,2,2,2,3,2,2,2,1,1
1,1,1,1,2,2,2,1,1,1,1
1,1,1,1,1,2,1,1,1,1,1
")

library(raster)

##  Plot original data as raster:
d <- raster(as.matrix(d))
plot(d, col=colorRampPalette(c("blue","yellow","red"))(255))

##  Simulate 30% missing data:
d_m <- d
d_m[ sample(1:length(d), length(d)/3) ] <- NA
plot(d_m, col=colorRampPalette(c("blue","yellow","red"))(255))

##  Construct a 3x3 filter for mean filling of missing values:
filter <- matrix(1, nrow=3, ncol=3) 

##  Fill in only missing values with the mean of the values within
##    the 3x3 moving window specified by the filter.  Note that this
##    could be replaced with a median/mode or some other whole-number
##    generating summary statistic:
r <- focal(d_m, filter, mean, na.rm=T, NAonly=T, pad=T)

##  Plot imputed data:
plot(r, col=colorRampPalette(c("blue","yellow","red"))(255), zlim=c(1,3))

This is an image of the original sample data:

Original sample data

With 30% missing values simulated:

Missing values

And only those missing values interpolated with the mean of the 3x3 moving window:

enter image description here

Forrest R. Stevens
  • 3,435
  • 13
  • 21
  • Thank you Ben! Just trying to pay forward a little bit of what I've learned from you and others over the years. – Forrest R. Stevens May 08 '15 at 21:21
  • while running "focal" command as given above, I am getting this error : unable to find an inherited method for function ‘focal’ for signature ‘"matrix"’ – Navin Manaswi May 09 '15 at 05:42
  • Make sure that the first argument is a `raster` object. You'll see in my example that I first convert the sample data, which is a `matrix` to the `raster` object before simulating the missing data. I think this is what you're running into. – Forrest R. Stevens May 09 '15 at 13:14
5

Here I compare Forrest's approach with a thin plate spline (TPS). Their performance is about the same -- depending on the sample. The TPS could be preferable if the gaps were larger such that focal could not estimate anymore --- but in that case you could also use a a larger (and perhaps Gaussian, see ?focalWeight) filter.

d <- matrix(c(
1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,2,1,1,1,1,1,
1,1,1,1,2,2,2,1,1,1,1,
1,1,2,2,2,3,2,2,2,1,1,
2,2,2,2,3,3,3,2,2,2,2,
1,1,2,2,2,3,2,2,2,1,1,
1,1,1,1,2,2,2,1,1,1,1,
1,1,1,1,1,2,1,1,1,1,1), ncol=11, byrow=TRUE)


library(raster)
d <- raster(d)
plot(d, col=colorRampPalette(c("blue","yellow","red"))(255))
##  Simulate 30% missing data:
set.seed(1)
d_m <- d
d_m[ sample(1:length(d), length(d)/3) ] <- NA
plot(d_m, col=colorRampPalette(c("blue","yellow","red"))(255))


# Forrest's solution:
filter <- matrix(1, nrow=3, ncol=3) 
r <- focal(d_m, filter, mean, na.rm=T, NAonly=T, pad=T)

#an alterative:
rp <- rasterToPoints(d_m)

library(fields)
# thin plate spline interpolation 
#(for a simple pattern like this, IDW might work, see ?interpolate)
tps <- Tps(rp[,1:2], rp[,3])
# predict
x <- interpolate(d_m, tps)
# use the orginal values where available
m <- cover(d_m, x)

i <- is.na(d_m)
cor(d[i], m[i])
## [1]  0.8846869
cor(d[i], r[i])
## [1] 0.8443165
Robert Hijmans
  • 40,301
  • 4
  • 55
  • 63
  • Nice! There are definitely other, more sophisticated interpolation techniques available, geostatistical approaches included. Hopefully these serve as good examples who might not be familiar with even simple two and three dimensional imputation/interpolation methods. – Forrest R. Stevens May 09 '15 at 03:01