2

Question

I am interested in converting numerical binary information into gender. During this, I came upon some behavior in R that I do not understand.

factor(c(0,1,0,1),labels = c("male","female"))

This works as intended. You get the following output:

[1] male   female male   female
Levels: male female

However, when I decide to be explicit and type the following:

factor(c(0,1,0,1),levels = c("male","female"), labels = c("male","female"))

It converts the numerical data to NA. This is disturbing to me because I am both specifying the levels and the labels. In my mind, the code I have written is equivalent, but is being interpreted by base R differently.

[1] <NA> <NA> <NA> <NA>
Levels: male female

My question is simple: why?

Caveats

I went to the factor function in R documentation. I have googled this question and searched on stackoverflow and as far as I know, this is so incredible simple that is has either not been asked or I asked it in such a way I could not find a duplicate. Thanks for your understanding.

hlyates
  • 1,279
  • 3
  • 22
  • 44
  • 3
    `levels = 0:1`. You are saying that you want levels to be `"male"` and `"female"` but your vector only has zeros and ones, not those values. – Rui Barradas Oct 10 '18 at 15:40
  • 5
    From `?factor`: "If no match is found for `x[i]` in `levels` [...] then the `i`-th element of the result is set to `NA`" – Henrik Oct 10 '18 at 15:42
  • 1
    levels are the values that appear in the vector you actually supply. labels are things to relabel them as. Your vector contained only 0 and 1 not male and female, so the result is NAs. – joran Oct 10 '18 at 15:43
  • In the link, check the first example in [@Ben Bolker's answer](https://stackoverflow.com/a/7128515/1851712) - same case as yours. – Henrik Oct 10 '18 at 15:49
  • @Henrik I did my best not to duplicate. Do you want me to delete or leave this for others. Not to be impolite and to respect SO culture. In addition, I do feel my question and your answer to me here makes more sense than the other link. However, I defer to the community on this. – hlyates Oct 10 '18 at 16:12
  • 2
    @hlyates No, I think you can leave your question here as a signpost to Ben's answer. Your question was clear and it was nice that you provided a small example! Factors _are_ tricky! See also [Confusion between factor levels and factor labels](https://stackoverflow.com/questions/5869539/confusion-between-factor-levels-and-factor-labels) (although your specific case is not treated there). Cheers – Henrik Oct 10 '18 at 16:22

1 Answers1

3

You must assign the levels attribute the values in your vector. And those values are zeros and ones, not "male" and "female".

factor(c(0, 1, 0, 1), levels = 0:1, labels = c("male", "female"))
#[1] male   female male   female
#Levels: male female
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66