I need to convert a messy factor into a numeric. The sample data looks like this:
x <- structure(c(4L, 5L, 1L, 6L, 6L, 2L, 3L),
.Label = c("", "106", "39", "8", "80", "chyb\x92 foto"), class = "factor")
My desired output would be:
x
[1] 8 80 NA NA NA 106 39
class(x)
"numeric"
However, the first line of my intended code results in a warning and the text is not replaced with NAs
.
x[grepl("[a-z]", x) | x==""] <- NA
x <- as.numeric(levels(x))[x]
Warning messages:
1: In grepl("[a-z]", x) : input string 4 is invalid in this locale
2: In grepl("[a-z]", x) : input string 5 is invalid in this locale
The second line then runs correctly and provides the correct output with NAs
introduced by coercion. Why does grepl
fail to recognise letters in some factor levels, and how can as.numeric
pick them out and replace them with NAs
?
The factor to numeric conversion was chosen from this question. However, the fact that it works does not answer my question why.
sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)
locale:
[1] cs_CZ.UTF-8/cs_CZ.UTF-8/cs_CZ.UTF-8/C/cs_CZ.UTF-8/cs_CZ.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_3.3.0