2

Hi guys :) I know this question has been asked before here for example but I would like to ask if 0 plays any important role using the as.numeric function. For example, we have the following simple code

x2<-factor(c(2,2,0,2), label=c('Male','Female'))
as.numeric(x2) #knonwing that this is not the appropriate command used , as.numeric(levels(x2))[x2] would be more appropriate but return NAs 

this returns

[1] 2 2 1 2

Is 0 being replaced here by 1 ? Moreover,

unclass(x2) 

seems to give the same thing as well:

[1] 2 2 1 2
attr(,"levels")
[1] "Male"   "Female"

It might be simple but I am trying to figure this out and it seems that I cant. Any help would be highly appreciated as I am new in R.

mata
  • 45
  • 4
  • Factors in `R` are coded starting at 1, not 0. When you create the `factor` the value 0 is lost, unless it's a level, but in your case the levels are "Male" and "Female". – Rui Barradas Sep 22 '17 at 07:57
  • So how do you actually get back `c(2,2,0,2)` from there ? – moodymudskipper Sep 22 '17 at 08:02
  • 2
    @Moody_Mudskipper ``factor`` is not an invertible transformation. Consider: ``identical(factor(c(2,2,0,2), label=c('Male','Female')), factor(c(2,2,1,2), label=c('Male','Female')))`` – orizon Sep 22 '17 at 08:10
  • 1
    I think this is the answer OP was looking for @orizon – moodymudskipper Sep 22 '17 at 08:16

1 Answers1

1

0 has no special meaning for factor.

As commenters have pointed out, factor recodes the input vector to an integer vector (starting with 1) and slaps a name tag onto each integer (the levels).

In the most simplest case, factor(c(2,2,0,2), the function takes the unique values of the input vector, sorts it, and converts it to a character vector, for the levels. I.e. the factor is internally represented as c(2,2,1,2) where 1 corresponds to '0' and 2 to '2'.

You then go further on by giving the levels some labels; these are normally identical to the levels. In your case factor(c(2,2,0,2), labels=c('Male','Female')), the levels are still evaluated to the sorted, unique vector (i.e. c(2,2,1,2)) but the levels now have labels Male for first level and Female for second level.

We can decide which levels should be used, as in factor(c(2,2,0,2), levels=c(2,0), labels=c('Male','Female')). Now we have been explicit towards which input value should have which level and label.

MrGumble
  • 5,631
  • 1
  • 18
  • 33