How to adapt string replacing function to replace specific numbers in data frame with NA?

Question

I wrote a function that perfectly replaces custom values of a matrix with NA.

NAfun <- function (x, z) {
  x[x %in% z] <- NA
  x
}

M <- matrix(1:12, 3, 4)
M[1, 2] <- -77
M[2, 1] <- -99
> M
     [,1] [,2] [,3] [,4]
[1,]    1  -77    7   10
[2,]  -99    5    8   11
[3,]    3    6    9   12

z <- c(-77, -99)

> NAfun(M, z)
     [,1] [,2] [,3] [,4]
[1,]    1   NA    7   10
[2,]   NA    5    8   11
[3,]    3    6    9   12

But this won't work with data frames.

D <- as.data.frame(matrix(LETTERS[1:12], 3, 4))
> D
  V1 V2 V3 V4
1  A  D  G  J
2  B  E  H  K
3  C  F  I  L

z <- c("B", "D")

> NAfun(D, z)
  V1 V2 V3 V4
1  A  D  G  J
2  B  E  H  K
3  C  F  I  L

D[] <- lapply(D, function(x) as.character(x))  # same with character vectors

> NAfun(D, z)
  V1 V2 V3 V4
1  A  D  G  J
2  B  E  H  K
3  C  F  I  L

If I convert the data frame to a matrix it works, though.

> NAfun(as.matrix(D), z)
     V1  V2  V3  V4 
[1,] "A" NA  "G" "J"
[2,] NA  "E" "H" "K"
[3,] "C" "F" "I" "L"

But I can't in my case.

I don't understand why this won't work as it is. And which way to adapt the function so that it works with a data frame, or preferably both types, thanks.

The behaviour is consistent across both cases because, in your first example, `D` is a matrix. — Dan, May 07 '18 at 21:03
`D <- sapply(D, as.character)` line has changed `D` into a matrix. Try `M %in% z` and you will see it will return a value for each column. That means %in% on data.frame compares column and not individual values. — MKR, May 07 '18 at 21:07
@Lyngbakr thanks to help clarify, I've adapted the question accordingly, the issue now is that the code won't work with data frames. — jay.sf, May 07 '18 at 21:21

MKR · Answer 1 · 2018-05-07T22:15:43.163

1

As @Lyngbakr has correctly mentioned that behavior is consistent between D and M. The NAfun function worked on D as it was already converted to matrix by line D <- sapply(D, as.character).

Now, question is why behavior is inconsistent between matrix and data.frame? The actual reason is %in% operator.

The %in% operator compares each value of matrix in vector z as:

D %in% z
#[1] FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

whereas %in% operator on data.frame compares for matching columns. Hence,

M %in% c(-99,-77)
#[1] FALSE FALSE FALSE FALSE

But

M %in% M[1:2]
#[1]  TRUE  TRUE FALSE FALSE

M %in% list(c(1,-99,3))
[1]  TRUE FALSE FALSE FALSE

Modification needed in function NAfun to handle both data.frame and matrix:

NAfun <- function (x, z) {
  x <- as.matrix(x)
  x[x %in% z] <- NA
  x
}

edited May 07 '18 at 22:15

answered May 07 '18 at 21:17

MKR

19,739
4
23
33

Indeed clarifying thanks, doesn't answer my question yet, though. – jay.sf May 07 '18 at 21:56
1

@jaySf Sorry, I didnt know you were waiting for answer. I think, considering the limitation described by me in my answer, perhaps best way is to modify the `NAfun` so that it converts parameter `x` in matrix. I will update my answer. – MKR May 07 '18 at 22:13
Okay thanks for your adaption, this gives me a matrix back, where there was a data.frame, though. – jay.sf May 07 '18 at 22:29

score 1 · Accepted Answer · answered May 07 '18 at 21:25

1

You can probably make this more elegant but here's a solution using purrr that works in both cases.

NAfun <- function (x, z) {

     f1 <- function(x, z){
          x[x %in% z] <- NA
          x
     }
     purrr::modify(x, ~f1(., z))
}

answered May 07 '18 at 21:25

TBT8

766
1
6
10

+1 Great, thanks! combined with mine it works now for both cases. What actually does `purrr::modify`, translated into base R? – jay.sf May 07 '18 at 22:10
1

`purrr` provides functions that are similar to the base `apply` functions. They differ in that the `purrr` family of functions are consistent both in the arguments they take as well as the output type they produce. You can see a detailed explanation on how `apply` and `purrr` functions differ here from Hadley: https://stackoverflow.com/questions/45101045/why-use-purrrmap-instead-of-lapply – TBT8 May 07 '18 at 22:30

How to adapt string replacing function to replace specific numbers in data frame with NA?

2 Answers2