Basically, I have a time-series of rasters in a stack. Here is my workflow:
Convert the stack to a data frame so each row represents a pixel, and each column represents a data. This process is fairly straightforward, so no issues here.
For each row (pixel), identify outliers and set them to NA. So in this case, I want to set what the outlier is. For example, let's say I want to set all the values larger than the 75th percentile to NA. The goal is so that when I calculate the mean, the outliers don't affect the calculation. The outliers in this case are several magnitudes higher, so they influence the mean significantly.
I got some help online and came up with this code:
my_data %>%
rowwise() %>%
mutate(across(is.numeric, ~ if (. > as.numeric(quantile(across(), .75, na.rm=TRUE))) NA else .))
The problem is that since it is a raster, there are a lot of NA values in some rows that I need the quantile function to ignore while calculating evaluating the cells (see below)
Using na.rm=TRUE
seemed to be the solution, but now I am encountering a new error
Error: Problem with
mutate()
input..1
. i..1 = across(...)
. x missing value where TRUE/FALSE needed i The error occurred in row 1.
I understand that to get around this, I need to tell the if function to ignore the value if it is NA, but the dplyr syntax is very complicated for me, I so need some help on how to do this.
Looking forward to learning more and if there is a better way to do what I'm trying to do. I don't think I did a good job explaining it but, hopefully the code helps.