0

I come from Object Oriented programming background and I find it difficult to wrap my head around R's programming approach. Here is the excerpt that I am stumbled upon:

> kids = factor(c(1,0,1,0,0,0), levels = c(0, 1),labels = c("boy","girl"))
> as.numeric(kids)
[1] 2 1 2 1 1 1

I was thinking it should print

[1] 1 0 1 0 0 0

since these {0,1} are the levels specified in factor(). But thats not the case. Then what are 2 1 2 1 1 1 values? Is it something like numeric representation of factor's elements maintained internally by R. or better to ask:

What as.numeric() on factor (i.e as.numeric(factorXyz)) returns?

If they are not the levels but some internal numeric values, then whats the point in having levels associated with factor elements?

Mahesha999
  • 22,693
  • 29
  • 116
  • 189

1 Answers1

1

Consider the case of

kids = factor(c("g", "b", "g", "b", "b", "b"), 
              levels = c("b", "g"), 
              labels = c("boy", "girl"))

In this case, it makes more sense to create a natural number reference to the factor's levels. factor is somewhat indifferent to what kind of input you provide it. It simply wants to consider the levels as natural numbers beginning with 1.

If my understanding is correct, this was originally designed around memory concerns around storing lots of characters in data. See stringsAsFactors: An unauthorized biography for the details behind the original design decisions.

Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • so the initial values `1,0,1,0,0,0` are lost and are not retrievable from `kids`? – Mahesha999 Apr 25 '16 at 10:26
  • Initial values can be retrieved with `as.numeric(as.character(kids))` – DeveauP Apr 25 '16 at 10:39
  • @DeveauP, that will actually return `NA`s from trying to coerce the labels to numerics, and in this case, the labels are characters. @Mahesha999, yes, when you convert to a factor, you will lose the initial 0:1, encoding. R deals with factors in terms of the integer reference level (1:n), and the label. If you need to retain knowledge of the original encoding, you'll need to either make a second vector (ie, `kids_num` and `kids_factor`) or make a reference object (ie, `data.frame(level = 0:1, label = c("boy", "girl"))` – Benjamin Apr 25 '16 at 11:04