fill NA raster cells using focal defined by boundary

Question

I have a raster and a shapefile. The raster contains NA and I am filling the NAs using the focal function

library(terra)

v <- vect(system.file("ex/lux.shp", package="terra"))
r <- rast(system.file("ex/elev.tif", package="terra"))
r[45:60, 45:60] <- NA
  
r_fill <- terra::focal(r, 5, mean, na.policy="only", na.rm=TRUE)

However, there are some NA still left. So I do this:

na_count <- terra::freq(r_fill, value = NA)

while(na_count$count != 0){
  
 r_fill <- terra::focal(r_fill, 5, mean, na.policy="only", na.rm=TRUE)  
 na_count <- terra::freq(r_fill, value = NA)
}

Once all NA's are filled, I clip the raster again using the shapefile

r_fill  <- terra::crop(r_fill, v, mask = T, touches = T)

This is what my before and after looks like:

I wondered if the while loop is an efficient way to fill the NAs or basically determine how many times I have to run focal to fill all the NAs in the raster.

Finding out how many times you need to run focal might be hard. At the moment, your while loop continues to impute N/As until the whole raster is filled. However, it would be more efficient if you could impute N/As until there are none left within the shapefill. That requires determining the contour of your shapefill and counting N/As only within. — Ventrilocus, Jan 23 '23 at 14:57
Rather than being inefficient, your loop could actually terminate too soon! You are essentially using a gap filling method known as relaxation, which should really be continued until the missing region changes by less than a threshold amount on each iteration. Have a look at [this answer](https://stackoverflow.com/a/53798137/2761575) which shows how to do this using focal in raster. Updating that solution to use terra should be fairly trivial. — dww, Jan 29 '23 at 23:06
[terra focal 930](https://github.com/rspatial/terra/issues/930), [terra focal 907](https://github.com/rspatial/terra/issues/907), my terra - 1.7.6, given @dww 'how focal results are implemented vs my likely naive, focal was done, and as per discussion above is actively under development/refinement, putting your version above would be useful for future searchers.. — Chris, Jan 30 '23 at 17:59

Chris · Answer 1 · 2023-01-27T23:15:02.817

Perhaps we can, or want to, dispense with the while( altogether by making a better estimate of focal('s w= arg in a world where r, as ground truth, isn't available. Were it available, we could readily derive direct value of w

r <- rast(system.file("ex/elev.tif", package="terra"))
# and it's variants
r2 <- r
r2[45:60, 45:60] <- NA
freq(r2, value=NA) - freq(r, value=NA)
  layer value count
1     0    NA   256
sqrt((freq(r2, value=NA) - freq(r, value=NA))$count)
[1] 16

which might be a good value for w=, and introducing another variant

r3 <- r
r3[40:47, 40:47] <- NA
r3[60:67, 60:67] <- NA
r3[30:37, 30:37] <- NA
r3[70:77, 40:47] <- NA
rm(r)

We no longer have our ground truth. How might we estimate an edge of w=? Turning to boundaries( default values (inner)

r2_bi <- boundaries(r2)
r3_bi <- boundaries(r3)
# examining some properties of r2_bi, r3_bi
freq(r2_bi, value=1)$count
[1] 503
freq(r3_bi, value=1)$count
[1] 579
freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
[1] 0.1306833
freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
[1] 0.1534588
sum(freq(r2_bi, value=1)$count,freq(r2_bi, value = 0)$count)
[1] 4352
sum(freq(r3_bi, value=1)$count,freq(r3_bi, value = 0)$count)
[1] 4352

Taken in reverse order, sum[s] and freq[s] suggest that while the total area of (let's call them holes) are the same, they differ in number and r2 is generally larger than r3. This is also clear from the first pair of freq[s]. Now we drift into some voodoo, hocus pocus in pursuit of a better edge estimate

    sum(freq(r2)$count) - sum(freq(r2, value = NA)$count)
    [1] 154
    sum(freq(r3)$count) - sum(freq(r3, value = NA)$count)
    [1] 154
    (sum(freq(r3)$count) - sum(freq(r3, value = NA)$count))
    [1] 12.40967
    freq(r2_bi, value=1)$count/freq(r2_bi, value = 0)$count
    [1] 0.1306833
    freq(r2_bi, value=0)$count/freq(r2_bi, value = 1)$count
    [1] 7.652087
    freq(r3_bi, value=1)$count/freq(r3_bi, value = 0)$count
    [1] 0.1534588
    taking the larger, i.e. freq(r2_bi 7.052087
    7.652087/0.1306833
    [1] 58.55444
    154+58
    [1] 212
    sqrt(212)
    [1] 14.56022
    round(sqrt(212)+1)
    [1] 16

Well, except for that +1 part, maybe still a decent estimate for w=, to be used on both r2 and r3 if called upon to find a better w, and perhaps obviate the need for while(. Another approach to looking for squares and their edges:

wtf3 <- values(r3_bi$elevation)
wtf2 <- values(r2_bi$elevation)
wtf2_tbl_df2 <-   as.data.frame(table(rle(as.vector(is.na(wtf2)))$lengths))
wtf3_tbl_df2 <- as.data.frame(table(rle(as.vector(is.na(wtf3)))$lengths))
names(wtf2_tbl_df2)
[1] "Var1" "Freq"
wtf2_tbl_df2[which(wtf2_tbl_df2$Var1 == wtf2_tbl_df2$Freq), ]
   Var1 Freq
14   16   16
wtf3_tbl_df2[which(wtf3_tbl_df2$Freq == max(wtf3_tbl_df2$Freq)), ]
  Var1 Freq
7    8   35
35/8
[1] 4.375 # 4 squares of 8 with 3 8 length vectors

bringing in v finally and filling

v <- vect(system.file("ex/lux.shp", package="terra"))
r2_fill_17 <- focal(r2, 16 + 1 , mean, na.policy='only', na.rm = TRUE)
r3_fill_9 <- focal(r3, 8 + 1 , mean, na.policy='only', na.rm = TRUE)
r2_fill_17_cropv <- crop(r2_fill_17, v, mask = TRUE, touches = TRUE)
r3_fill_9_cropv <- crop(r3_fill_9, v, mask = TRUE, touches = TRUE)

And I now appreciate your while( approach as your r2 looks better, more naturally transitioned, though the r3 looks fine. In my few, brief experiments with smaller than 'hole', i.e. focal(r2, 9, I got the sense it would take 2 passes to fill, that suggests focal(r2, 5 would take 4.

I guess further determining the proportion of fill:hole:rast for when to deploy a while would be worthwhile.

fill NA raster cells using focal defined by boundary

1 Answers1