I'm attempting to change values of a variable into NA values if they're not in a vector:
sample <- factor(c('01', '014', '1', '14', '24'))
df <- data.frame(var1 = 1:6, var2 = factor(c('01', '24', 'none', '1', 'unknown', '24')))
df$var2 <- ifelse(df$var2 %in% sample, df$var2, NA)
For some reason R does not preserve original values of the factor variable but turns them into numeric sequence:
> sample <- factor(c('01', '014', '1', '14', '24'))
> df <- data.frame(var1 = 1:6,
var2 = factor(c('01', '24', 'none', '1', 'unknown', '24')))
> class(df$var2)
[1] "factor"
> df
var1 var2
1 1 01
2 2 24
3 3 none
4 4 1
5 5 unknown
6 6 24
> df$var2 <- ifelse(df$var2 %in% sample, df$var2, NA)
> class(df$var2)
[1] "integer"
> df
var1 var2
1 1 1
2 2 3
3 3 NA
4 4 2
5 5 NA
6 6 3
Why does this happen and what would be the correct way of achieving what I'm trying to here?
(I need to use factors rather than integers in order not to confuse "01" and "1" and my original data set is large, so using factors rather than characters should save me some memory)