I have a dataset that includes some non-referenced data that I would like to replace with NA. In the following example, if the data in columns rep1 to rep4 does not match one of the values in the ID column, I would like to replace the value with NA. In this case, the values of x, y, and z aren't listed in the ID column, so they should be replaced.
This is a somewhat similar question that I asked earlier here : If data present, replace with data from another column based on row ID
I think the solution will be similar to what was given in the previous question, but I don't know how to alter the second portion ~ value[match(., ID)]
to return NA for values that aren't listed in the ID column.
df %>% mutate_at(vars(rep1:rep4), ~ value[match(., ID)])
ID rep1 rep2 rep3 rep4
a
b a
c a b
d a b c
e a b c d
f
g x
h
i
j y z
k z
l
m
The result should look like this:
ID rep1 rep2 rep3 rep4
a
b a
c a b
d a b c
e a b c d
f
g NA
h
i
j NA NA
k NA
l
m
Here is the data using dput()
structure(list(ID = structure(1:13, .Label = c("a", "b", "c",
"d", "e", "f", "g", "h", "i", "j", "k", "l", "m"), class = "factor"),
rep1 = structure(c(1L, 2L, 2L, 2L, 2L, 1L, 3L, 1L, 1L, 4L,
5L, 1L, 1L), .Label = c("", "a", "x", "y", "z"), class = "factor"),
rep2 = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 3L,
1L, 1L, 1L), .Label = c("", "b", "z"), class = "factor"),
rep3 = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("", "c"), class = "factor"), rep4 = structure(c(1L,
1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"d"), class = "factor")), class = "data.frame", row.names = c(NA, -13L))