2

I'm trying to understand why this R code does a certain transformation.

Df[,"cutoff"] = as.numeric(levels(Df[,"cutoff"]))[Df[,"cutoff"]]

Previously, Df[,"cutoff"] is a factor with 49 levels and now after this operation, it's a vector. I just don't understand this syntax at all. Is there an explanation behind what having as.numeric(levels(Df[,"cutoff"])) does to a factor?

Thanks!

Kashif
  • 3,063
  • 6
  • 29
  • 45
  • Could you please include a [reproducible example](http://stackoverflow.com/a/28481250/215487)? – Christopher Bottoms May 28 '15 at 17:15
  • That code is confusing, but if `cutoff` is "really" a numeric variable, then I think it's the same as `as.numeric(as.character(Df[,"cutoff"]))`, which is how you would convert a factor variable to numeric. Also, `Df[,"cutoff"]` is a vector either way, but before the operation its class is factor and after its class is numeric. – eipi10 May 28 '15 at 17:23
  • @Glassjawed, could you update your answer with what you you were trying to accomplish? Maybe the answer to that question would be more helpful. – Brandon Bertelsen May 28 '15 at 17:30
  • To add to @eipi10 's response. Using the `.Primitive('[')` subset function seems to do something with the same effect as `as.numeric(Df[,"cutoff"])` , which interestedly requires an understanding of how R deals with `as.numeric(factor)` (with many explanations on SO). – Vlo May 28 '15 at 17:33

2 Answers2

0

If for any reason you get the numbers as factors, some R functions do not interpret those as numbers even though you see numbers. For example summary will count the number of cases instead the usual six numbers.

See:

Df=data.frame(cutoff=factor(rep(c(2:6),2)),y=runif(10,12,15))
str(Df)
summary(Df[,"cutoff"])
2 3 4 5 6 
2 2 2 2 2
#If you want the levels as numbers
Df[,"cutoff"] = as.numeric(levels(Df[,"cutoff"]))[Df[,"cutoff"]]
summary(Df[,"cutoff"])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      2       3       4       4       5       6
Robert
  • 5,038
  • 1
  • 25
  • 43
  • You could get the same result with `as.numeric(as.character(Df[,"cutoff"]))`, but it is less efficient (http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f) – Robert May 28 '15 at 17:29
0

It's a vector of NA, if the factor was not a displayed numeric.

df <- data.frame(cutoff = letters[1:26])
as.numeric(levels(df[,"cutoff"]))[df[,"cutoff"]]
#  [1] NA NA NA NA NA NA NA NA NA NA NA NA ...
# Warning message:
# NAs introduced by coercion 

Let's break it down, this shows you the levels of the factor, returning a character string:

levels(df[,"cutoff"])
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" ...

This tries to convert a character string to numeric (which it can't, and therefore returns NA)

as.numeric(levels(df[,"cutoff"]))
# [1] NA NA NA NA NA NA NA NA NA NA NA NA NA ...
# Warning message:
# NAs introduced by coercion

Now, adding the last element [df[,"cutoff"]], all this does is subset the result by the factor df[,"cutoff"], but since every element is NA, you wouldn't see any difference. In practice this would likely change the order of the result in unexpected (read: useless) ways.

as.numeric(levels(df[,"cutoff"]))[df[,"cutoff"]]
# [1] NA NA NA NA NA NA NA NA NA NA NA NA NA ...
# Warning message:
# NAs introduced by coercion 
Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
  • Brandon, couldn't it be a numeric variable that somehow got converted to a factor? The OP's code would then convert it back to numeric, but `as.numeric(as.character(...))` is more straightforward. – eipi10 May 28 '15 at 17:38
  • Possible, but typically if it was "converted to factor somehow" that means there's a character in it somewhere, which means it would end up the same way. – Brandon Bertelsen May 28 '15 at 17:53