4

I have a data set, (call it DATA) with a variable, COLOR. The mode of COLOR is numeric and the class is factor. First, I'm a bit confused by the "numeric" -- when printed out, the data for COLOR are not numeric -- they are all character values, like White or Blue or Black, etc. Any clarification on this is appreciated.

Further, I need to Write R code to return the levels of the COLOR variable, then determine the current reference level of this variable, and finally set the reference level of this variable to White. I tried using factor, but was entirely unsuccessful.

Thank you for taking the time to help.

Mike L
  • 486
  • 5
  • 16
  • 33

2 Answers2

9

mode(DATA$COLOR) is "numeric" because R internally stores factors as numeric codes (to save space), plus an associated vector of labels corresponding to the code values. When you print the factor, R automatically substitutes the corresponding label for each code.

f <- factor(c("orange","banana","apple"))
## [1] orange banana apple 
## Levels: apple banana orange
str(f)
##  Factor w/ 3 levels "apple","banana",..: 3 2 1
c(f)    ## strip attributes to get a numeric vector
## [1] 3 2 1 
attributes(f)
## $levels
## [1] "apple"  "banana" "orange"
## $class
## [1] "factor"

... I need to Write R code to return the levels of the COLOR variable ...

levels(DATA$COLOR)

... then determine the current reference level of this variable,

levels(DATA$COLOR)[1]

... and finally set the reference level of this variable to White.

DATA$COLOR <- relevel(DATA$COLOR,"White")
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • This seemed to work, thank you very much. One more request: how do I check to see that the changes were made? If I use levels(DATA$COLOR)[1] again, it'll just print out the original, not the newly re-leveled reference, right? – Mike L Apr 25 '13 at 16:00
  • 2
    Are you sure? `levels(relevel(factor(letters), 'z'))[1]` != `levels(factor(letters))[1]` – Rcoster Apr 25 '13 at 16:44
  • Agree with @Rcoster . Perhaps you forgot to assign the result of `relevel()` back to the `DATA$COLOR` variable ... ? – Ben Bolker Apr 25 '13 at 19:11
  • You're right -- I just forgot to nest it. Thank you very much! – Mike L Apr 25 '13 at 21:28
3

This is a consequence of how R stores factors. The values you see in the console look like characters but are stored internally as numbers (for reasons which are probably beyond the scope here).

If you want to recover the levels, you can type levels(DATA$COLOR). Take a look at ?factor and ?levels in the console to see more. If you want to re-level a factor then try and add a reproducible example so I can walk through the code.

Community
  • 1
  • 1
Adam Hyland
  • 878
  • 1
  • 9
  • 21
  • Ah, I see what you mean about the internal storage -- thanks for clarifying that, it was a bit confusing for an R newbie like me. – Mike L Apr 25 '13 at 15:59