R: pmax() function to ignore NA's?

Question

I built this custom "winsorize" function that does what it should, unless there are NA's in the data.

How it works:

winsor1 <- function(x, probability){

  numWin <- ceiling(length(x)*probability)

  # Replace first lower, then upper
  x <- pmax(x, sort(x)[numWin+1])
  x <- pmin(x, sort(x)[length(x)-numWin])

  return(x)
}

x <- 0:10

winsor1(x, probability=0.01)
[1] 1  1  2  3  4  5  6  7  8  9  9

So it replaces the top (and bottom) 1% of the data (rounded up to the next value, since there are only 11 values in the example). If there are, e.g., 250 values then the bottom 3 and top 3 values would be replaced by the bottom 4th and top 4th respectively.

The whole thing breaks down when there are NA's in the data, causing an error. However, if I set na.rm = TRUE in the pmax() and pmin() then the NA's themselves are replaced by the bottom value.

x[5] <- NA

winsor1(x, probability=0.01)
[1] 1  1  2  3  1  5  6  7  8  9  9

What can I do so that the NA's are preserved but do not cause an error? This is the output I want for the last line:

winsor1(x, probability=0.01)
[1] 1  1  2  3  NA  5  6  7  8  9  9

`sort` removes the `NA` elements (`sort(c(1, 2, NA, 3))# [1] 1 2 3`) or else you have to specify `na.last = TRUE` > — akrun, May 10 '20 at 19:04
An NA-aware `pmax_/pmin_` are in [my answer to 'Dealing with NAs when calculating... summary in group_by'](https://stackoverflow.com/a/31060373/202229). — smci, May 10 '20 at 19:41
But your issue is not with NA treatment in `pmax()`, it's with `sort()`. What did you expect it to did with the NAs? [`sort()` has option `na.last = NA/TRUE/FALSE` to respectively remove/place last/first the NAs](https://stat.ethz.ch/R-manual/R-devel/library/base/html/sort.html) — smci, May 10 '20 at 19:44

akrun · Accepted Answer · 2020-05-10T19:34:18.173

2

The issue is with sort as it removes the NA by default or else we have to specify na.last = TRUE which may also not be the case we need. One option is order

winsor1 <- function(x, probability){

  numWin <- ceiling(length(x)*probability)

  # Replace first lower, then upper
  x1 <- x[order(x)]
  x <- pmax(x, x1[numWin+1])
  x1 <- x1[order(x1)]
  x <- pmin(x, x1[length(x)-numWin], na.rm = TRUE)

  return(x)
}

-testing

x <- 0:10
winsor1(x, probability=0.01)
#[1] 1 1 2 3 4 5 6 7 8 9 9

x[5] <- NA 
winsor1(x, probability=0.01)
#[1]  1  1  2  3 NA  5  6  7  8  9 10

or with na.last in sort

winsor1 <- function(x, probability){

  numWin <- ceiling(length(x)*probability)

  # Replace first lower, then upper
  x <- pmax(x, sort(x, na.last = TRUE)[numWin+1])
  x <- pmin(x, sort(x, na.last = TRUE)[length(x)-numWin], na.rm = TRUE)

  return(x)
}

edited May 10 '20 at 19:34

answered May 10 '20 at 19:08

akrun

874,273
37
540
662

Thanks, good to know that sort removes NA. However, that does not work if I do x[2:3] <- NA Also, the order of the original series has to be preserved – Joef May 10 '20 at 19:13
@Joef what is the expected output for that – akrun May 10 '20 at 19:14
If I use your last function, the output is all NA's. The desired output is 3, NA, NA, 3, 4, 5, 6, 7, 8, 9, 9 – Joef May 10 '20 at 19:16
In the current function, the `length(x)- numWin` returns an index of 10, which is `NA` for the `sort`ed value, you can specify the `na.rm = TRUE` in `pmin` – akrun May 10 '20 at 19:18
1

Yes that works! Thank you. My other solution, which also works, would have been `x <- pmin(x, sort(x, na.last = TRUE)[length(x)-numWin-sum(is.na(x))])` – Joef May 10 '20 at 19:23

R: pmax() function to ignore NA's?

1 Answers1