Remove raster files with NA values from the list

Question

I have a list of raster files, and I wanna go through each file and if one has NA values I wanna delete it from the list!

like this list

[1] "./2013105_33UXP_04_05_L8_sr_band2.tif" "./2013105_33UXP_04_05_L8_sr_band3.tif" "./2013105_33UXP_04_05_L8_sr_band4.tif"
  [4] "./2013105_33UXP_04_05_L8_sr_band5.tif" "./2013105_33UXP_04_05_L8_sr_band6.tif" "./2013105_33UXP_04_05_L8_sr_band7.tif"
  [7] "./2013114_33UXP_04_05_L8_sr_band2.tif" "./2013114_33UXP_04_05_L8_sr_band3.tif" "./2013114_33UXP_04_05_L8_sr_band4.tif"
 [10] "./2013114_33UXP_04_05_L8_sr_band5.tif" "./2013114_33UXP_04_05_L8_sr_band6.tif" "./2013114_33UXP_04_05_L8_sr_band7.tif"
 [13] "./2013121_33UXP_04_05_L8_sr_band2.tif" "./2013121_33UXP_04_05_L8_sr_band3.tif" "./2013121_33UXP_04_05_L8_sr_band4.tif"
 [16] "./2013121_33UXP_04_05_L8_sr_band5.tif" "./2013121_33UXP_04_05_L8_sr_band6.tif" "./2013121_33UXP_04_05_L8_sr_band7.tif"

How I can do it?

thanks

I mean if more than 90% is NA values remove it otherwise change the value to 1 — Oumnia Asadian, Nov 28 '17 at 16:46
Could you please share an example of your data so that we have a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example-aka-mcve-minimal-complete-and-ver). How does your data look like? — Manuel Bickel, Nov 28 '17 at 16:48
I have a list or raster files( that some files has probably NA values) — Oumnia Asadian, Nov 28 '17 at 16:50
As far as I understand the data you posted are only images. How does the data look like after reading it into R. Or is that your question? I have no experience with "raster files", is there any standard for this type of data. I thought your question is just about finding NA values. — Manuel Bickel, Nov 28 '17 at 17:12
yes they rae raster images that some contains NA I wanna find those images that has NA and romve them from my list but with if function that if its more than 90% of the image is NA remove otherwise set the NA values to 0 — Oumnia Asadian, Nov 28 '17 at 17:22
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, [describe the problem](https://meta.stackoverflow.com/questions/254393) and what has been done so far to solve it. — Sébastien Rochette, Nov 28 '17 at 17:29

Robert Hijmans · Answer 1 · 2017-11-28T17:45:29.960

Here is a reproducible example with a solution

library(raster)

# example data
r <- raster(ncol=10, nrow=10)

set.seed(0)
# 10 layers
s <- stack(lapply(1:10, function(i) setValues(r, runif(ncell(r)))))
# set about half the values to NA
s[s < .5] <- NA

s
#class       : RasterBrick 
#dimensions  : 10, 10, 100, 10  (nrow, ncol, ncell, nlayers)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
#data source : in memory
#names       :   layer.1,   layer.2,   layer.3,   layer.4,   layer.5,   layer.6,   layer.7,   layer.8,   layer.9,  layer.10 
#min values  : 0.5186343, 0.5004410, 0.5069395, 0.5070356, 0.5008505, 0.5253055, 0.5017548, 0.5161239, 0.5055311, 0.5019486 
#max values  : 0.9919061, 0.9926841, 0.9815635, 0.9960774, 0.9937492, 0.9959655, 0.9756573, 0.9994554, 0.9906600, 0.9999306

Now use the example data to remove layers that have more than 50% of cells that are NA

# count the NA values in each layer
i <- cellStats(is.na(s), sum)
# fraction that is NA
i <- i/ncell(s)

i
# layer.1  layer.2  layer.3  layer.4  layer.5  layer.6  layer.7  layer.8  layer.9 layer.10 
#    0.52     0.46     0.62     0.56     0.53     0.44     0.46     0.51     0.55     0.54 

# select the layers that more than half the cells with values
ss <- s[[which(i>.5)]]

ss
#class       : RasterBrick 
#dimensions  : 10, 10, 100, 7  (nrow, ncol, ncell, nlayers)
#resolution  : 36, 18  (x, y)
#extent      : -180, 180, -90, 90  (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
#data source : in memory
#names       :   layer.1,   layer.3,   layer.4,   layer.5,   layer.8,   layer.9,  layer.10 
#min values  : 0.5186343, 0.5069395, 0.5070356, 0.5008505, 0.5161239, 0.5055311, 0.5019486 
#max values  : 0.9919061, 0.9815635, 0.9960774, 0.9937492, 0.9994554, 0.9906600, 0.9999306

So what I found out now that using cellstats for a large set of raster layers or large stack takes very long what I am using now is this way which is super fast and find the raster layers with 90% NA values in a minute. ( I have about 800 layers) — Oumnia Asadian, Nov 29 '17 at 08:17
## set na limit (e.g., 5% of all cells) limit <- 0.9 * ncell(stack) library(doParallel) cl <- makeCluster(detectCores() - 1) registerDoParallel(cl) ## loop over layers in parallel system.time( ld1 <- foreach(i = unstack(stack), .packages = "raster", .combine = "c") %dopar% { sum(is.na(i[])) < limit } ) — Oumnia Asadian, Nov 29 '17 at 08:19
Have no idea how I can insert my codes in answer! Looks messy :)) — Oumnia Asadian, Nov 29 '17 at 08:20

score -1 · Answer 2 · answered Nov 28 '17 at 17:41

It would be helpful if we could get reproducible code that shows what it looks like when one of your raster files is read into R. Does it come in as a dataframe? as a list? as a matrix? It makes it difficult to fully answer your question.

I think you'll want something along the lines of..

library(dplyr)
library(raster)

new_list_of_files = list()

for (file in seq_along(list_of_files)){
     imported_raster = raster(file)
     df <- as.data.frame(imported_raster)

     #check NAs
     p_cent_na <- sum(is.na(df))/(nrow(df)*ncol(df))
     if (p_cent_na > .9){
            df[is.na(df),] <- 0
            new_list_of_files <- list(new_list_of_files, df)
     }
}

I have no idea if this will work because I would need an example of what one of your .tif files looks like.

Remove raster files with NA values from the list

2 Answers2