1

I wrote a function that perfectly replaces custom values of a matrix with NA.

NAfun <- function (x, z) {
  x[x %in% z] <- NA
  x
}

M <- matrix(1:12, 3, 4)
M[1, 2] <- -77
M[2, 1] <- -99
> M
     [,1] [,2] [,3] [,4]
[1,]    1  -77    7   10
[2,]  -99    5    8   11
[3,]    3    6    9   12

z <- c(-77, -99)

> NAfun(M, z)
     [,1] [,2] [,3] [,4]
[1,]    1   NA    7   10
[2,]   NA    5    8   11
[3,]    3    6    9   12

But this won't work with data frames.

D <- as.data.frame(matrix(LETTERS[1:12], 3, 4))
> D
  V1 V2 V3 V4
1  A  D  G  J
2  B  E  H  K
3  C  F  I  L

z <- c("B", "D")

> NAfun(D, z)
  V1 V2 V3 V4
1  A  D  G  J
2  B  E  H  K
3  C  F  I  L

D[] <- lapply(D, function(x) as.character(x))  # same with character vectors

> NAfun(D, z)
  V1 V2 V3 V4
1  A  D  G  J
2  B  E  H  K
3  C  F  I  L

If I convert the data frame to a matrix it works, though.

> NAfun(as.matrix(D), z)
     V1  V2  V3  V4 
[1,] "A" NA  "G" "J"
[2,] NA  "E" "H" "K"
[3,] "C" "F" "I" "L"

But I can't in my case.

I don't understand why this won't work as it is. And which way to adapt the function so that it works with a data frame, or preferably both types, thanks.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    The behaviour is consistent across both cases because, in your first example, `D` is a matrix. – Dan May 07 '18 at 21:03
  • `D <- sapply(D, as.character)` line has changed `D` into a matrix. Try `M %in% z` and you will see it will return a value for each column. That means %in% on data.frame compares column and not individual values. – MKR May 07 '18 at 21:07
  • @Lyngbakr thanks to help clarify, I've adapted the question accordingly, the issue now is that the code won't work with data frames. – jay.sf May 07 '18 at 21:21
  • 1
    @jaySf `NAfun(as.matrix(D), z)` should still work. – MKR May 07 '18 at 21:23
  • @MKR thanks, I reverted accidentally deleted part – jay.sf May 07 '18 at 21:28

2 Answers2

1

As @Lyngbakr has correctly mentioned that behavior is consistent between D and M. The NAfun function worked on D as it was already converted to matrix by line D <- sapply(D, as.character).

Now, question is why behavior is inconsistent between matrix and data.frame? The actual reason is %in% operator.

The %in% operator compares each value of matrix in vector z as:

D %in% z
#[1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

whereas %in% operator on data.frame compares for matching columns. Hence,

M %in% c(-99,-77)
#[1] FALSE FALSE FALSE FALSE

But

M %in% M[1:2]
#[1]  TRUE  TRUE FALSE FALSE

M %in% list(c(1,-99,3))
[1]  TRUE FALSE FALSE FALSE

Modification needed in function NAfun to handle both data.frame and matrix:

NAfun <- function (x, z) {
  x <- as.matrix(x)
  x[x %in% z] <- NA
  x
}
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Indeed clarifying thanks, doesn't answer my question yet, though. – jay.sf May 07 '18 at 21:56
  • 1
    @jaySf Sorry, I didnt know you were waiting for answer. I think, considering the limitation described by me in my answer, perhaps best way is to modify the `NAfun` so that it converts parameter `x` in matrix. I will update my answer. – MKR May 07 '18 at 22:13
  • Okay thanks for your adaption, this gives me a matrix back, where there was a data.frame, though. – jay.sf May 07 '18 at 22:29
1

You can probably make this more elegant but here's a solution using purrr that works in both cases.

NAfun <- function (x, z) {

     f1 <- function(x, z){
          x[x %in% z] <- NA
          x
     }
     purrr::modify(x, ~f1(., z))
}
TBT8
  • 766
  • 1
  • 6
  • 10
  • +1 Great, thanks! combined with mine it works now for both cases. What actually does `purrr::modify`, translated into base R? – jay.sf May 07 '18 at 22:10
  • 1
    `purrr` provides functions that are similar to the base `apply` functions. They differ in that the `purrr` family of functions are consistent both in the arguments they take as well as the output type they produce. You can see a detailed explanation on how `apply` and `purrr` functions differ here from Hadley: https://stackoverflow.com/questions/45101045/why-use-purrrmap-instead-of-lapply – TBT8 May 07 '18 at 22:30