Can someone please explain how as.numeric(levels(x))[x] exactly work? here x is a factor variable.(for example x<-as.factor(sample(1:5,20,replace=TRUE)) ) As much as i am able to understand is that first we are getting the levels of x (which will be character after that we are changing it to numeric. what is happening after that I am not able to get. I know this representation is same as as.numeric(as.character(x)).
-
Have you read the first answer [here](https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information)? – De Novo Nov 13 '18 at 18:57
-
...then it's just using `x` values as positions to get the corresponding levels, in a numeric form. You can use `as.numeric(levels(x))[c(1,1,2)]` as an example, which means give me the 1st, 1st (again) and 2nd level. If you try to ask for something that doesn't exist it will return `NA` like this `as.numeric(levels(x))[c(1,1,2,6)]` – AntoniosK Nov 13 '18 at 18:58
-
@DeNovo Yes I saw that post but I think It was regarding how to perform the conversion but not about how exactly it is happening. – nand Nov 13 '18 at 19:38
-
@AntoniosK got it. Thank you. – nand Nov 13 '18 at 20:08
2 Answers
R factors are vectors of integers that serve as indices into the levels character vector. So the inner part of that expression is creating a character vector. The outer part is converting the set of values: "5", "2", "4" .... etc into numeric values.
> x<-as.factor(sample(1:5,20,replace=TRUE))
The storage class of factor objects is integer:
> dput (x)
structure(c(4L, 2L, 3L, 4L, 5L, 2L, 2L, 2L, 1L, 2L, 4L, 2L, 1L,
5L, 5L, 4L, 1L, 5L, 1L, 5L), .Label = c("1", "2", "3", "4", "5"
), class = "factor")
The levels() function returns the .Label
attribute of a factor, and when a factor is used as an index, it gets handled as an integer:
> levels(x)[x]
[1] "4" "2" "3" "4" "5" "2" "2" "2" "1" "2" "4" "2" "1" "5" "5" "4" "1" "5" "1" "5"
This method of conversion or extractions is slightly faster than as.character(x)
, but as you have experienced, it may seem a bit cryptic if you haven't worked through what is happening "under the hood" (or "bonnet" if that's what it's called in your part of the Englrish speaking world.)

- 258,963
- 21
- 364
- 487
I always confused with R's factors. Usually, I use a perfect idea from package Rfast, the function Rfast::ufactor
. It represents a factor using its initial type.
Here is an exmple:
x <- rnorm(10)
fx<- Rfast::ufactor(x)
fx$levels # you can get the levels like this
fx$values # you can get the values like this
Fast and simple. Rfast::ufactor
is much faster than R's but I will not post any benchmark cause it doens't fit to the question.

- 564
- 5
- 17