Removing outliers in time series rasters per pixel in R

Question

Basically, I have a time-series of rasters in a stack. Here is my workflow:

Convert the stack to a data frame so each row represents a pixel, and each column represents a data. This process is fairly straightforward, so no issues here.

For each row (pixel), identify outliers and set them to NA. So in this case, I want to set what the outlier is. For example, let's say I want to set all the values larger than the 75th percentile to NA. The goal is so that when I calculate the mean, the outliers don't affect the calculation. The outliers in this case are several magnitudes higher, so they influence the mean significantly.

I got some help online and came up with this code:

my_data %>%
  rowwise() %>%
  mutate(across(is.numeric, ~ if (. > as.numeric(quantile(across(), .75, na.rm=TRUE))) NA else .))

The problem is that since it is a raster, there are a lot of NA values in some rows that I need the quantile function to ignore while calculating evaluating the cells (see below)

Using na.rm=TRUE seemed to be the solution, but now I am encountering a new error

Error: Problem with mutate() input ..1. i ..1 = across(...). x missing value where TRUE/FALSE needed i The error occurred in row 1.

I understand that to get around this, I need to tell the if function to ignore the value if it is NA, but the dplyr syntax is very complicated for me, I so need some help on how to do this.

Looking forward to learning more and if there is a better way to do what I'm trying to do. I don't think I did a good job explaining it but, hopefully the code helps.

Can you share the dataframe with ```dput()```. It makes things easier for everyone. For more details: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Shibaprasadb, Oct 07 '21 at 05:57

Robert Hijmans · Answer 1 · 2021-10-07T17:37:25.010

When asking a R question, you should always include some example data. Either create data with code (see below) or use a file that ships with R (do not use dput if it can be avoided). See the help files that ship with R, or other questions on this site for examples and inspiration.

Example data:

library(terra)
r <- rast(ncols=10, nrows=10, nlyr=10)
set.seed(1)
v <- runif(size(r))
v[sample(size(r), 100)] <- NA
values(r) <- v

Solution:

First write a function that does what you want, and works with a vector

f <- function(x) {
    q <- quantile(x, .75, na.rm=TRUE)
    x[x>q] <- NA
    x
}

Now apply it to the raster data

x <- app(r, f)

With the raster package it would go like

library(raster)
rr <- brick(r)
xx <- calc(rr, f)

Note that you should not create a data.frame, but if you did you could do something like dd <- t(apply(d, 1, f))

Removing outliers in time series rasters per pixel in R

1 Answers1