I am trying to achieve something similar to this question but with multiple values that must be replaced by NA, and in large dataset.
df <- data.frame(name = rep(letters[1:3], each = 3), foo=rep(1:9),var1 = rep(1:9), var2 = rep(3:5, each = 3))
which generates this dataframe:
df
name foo var1 var2
1 a 1 1 3
2 a 2 2 3
3 a 3 3 3
4 b 4 4 4
5 b 5 5 4
6 b 6 6 4
7 c 7 7 5
8 c 8 8 5
9 c 9 9 5
I would like to replace all occurrences of, say, 3 and 4 by NA, but only in the columns that start with "var".
I know that I can use a combination of []
operators to achieve the result I want:
df[,grep("^var[:alnum:]?",colnames(df))][
df[,grep("^var[:alnum:]?",colnames(df))] == 3 |
df[,grep("^var[:alnum:]?",colnames(df))] == 4
] <- NA
df
name foo var1 var2
1 a 1 1 NA
2 a 2 2 NA
3 a 3 NA NA
4 b 4 NA NA
5 b 5 5 NA
6 b 6 6 NA
7 c 7 7 5
8 c 8 8 5
9 c 9 9 5
Now my questions are the following:
- Is there a way to do this in an efficient way, given that my actual dataset has about 100.000 lines, and 400 out of 500 variables start with "var". It seems (subjectively) slow on my computer when I use the double brackets technique.
- How would I approach the problem if
instead of 2 values (3 and 4) to be replaced by NA, I had a long
list of, say, 100 various values? Is there a way to specify multiple values with having to do a clumsy series of conditions separated by
|
operator?