4

I'm trying to understand how various objects in R are composed of atomic and generic vectors.

One can construct a data.frame out of a list by manually setting the attributes names, row.names, and class, see here.

I wonder how this might work for factors, which are internally represented as integer vectors. The solution I came up with is the following:

> f <- 1:3
> class(f) <- "factor"
> levels(f) <- c("low", "medium", "high")
Warning message:
In str.default(val) : 'object' does not have valid levels()

But for some reason this still looks different than a properly constructed factor:

> str(unclass(f))
 int [1:3] 1 2 3
 - attr(*, "levels")= chr [1:3] "low" "medium" "high"
> str(unclass(factor(c("low", "medium", "high"))))
 int [1:3] 2 3 1
 - attr(*, "levels")= chr [1:3] "high" "low" "medium"

Am I missing something? (I know this probably should not be used in production code, instead it is for educational purposes only.)

Quasimodo
  • 172
  • 10

1 Answers1

4

The order matters.

f <- 1:3
levels(f) <- c("low", "medium", "high")  ## mark
class(f) <- "factor"
f
# [ 1] low    medium high  
# Levels: low medium high

`levels<-` adds an attribute to the vector, instead of line ## mark you could also do

attr(f, 'levels') <- c("low", "medium", "high")

Here step by step what happens:

f <- 1:3
attributes(f)
# NULL

levels(f) <- c("low", "medium", "high")
attributes(f)
# $levels
# [1] "low"    "medium" "high"  

class(f) <- "factor"
attributes(f)
# $levels
# [1] "low"    "medium" "high"  
# 
# $class
# [1] "factor"

Check with "automatic" factor generation.

attributes(factor(1:3, labels=c("low", "medium", "high")))
# $levels
# [1] "low"    "medium" "high"  
# 
# $class
# [1] "factor"

And, importantly

stopifnot(all.equal(unclass(f), 
                    unclass(factor(1:3, labels=c("low", "medium", "high")))))

Note 1, the order of f doesn't matter. Levels of f are identified by their index, and element n of the assigned levels vector becomes first level, i.e. `1`='low', `2`='medium', `3`='high' in following example.

f <- 3:1
levels(f) <- c("low", "medium", "high")
class(f) <- 'factor'
f
# [1] high   medium low   
# Levels: low medium high

Note 2, that this only works if f starts with 1 and also the levels increase by 1, because a factor is actually a labeled integer structure.

g <- 2:4
levels(g) <- c("low", "medium", "high")
class(g) <- 'factor'
g
# Error in as.character.factor(x) : malformed factor

h <- c(1, 3, 4)
levels(h) <- c("low", "medium", "high")
class(h) <- 'factor'
# Error in class(h) <- "factor" : 
#   adding class "factor" to an invalid object
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 1
    Concerning the possibility to use `attr(obj, "levels") <-` instead of `levels(obj) <-`: The documentation at `?levels` states: “Note that for a factor, replacing the levels via `levels(x) <- value` is not the same as (and is preferred to) `attr(x, "levels") <- value`.” But using my R 4.1.2 I can only find `levels.default` defined as `attr(x, "levels")` and no `levels.factor`. So this is most likely obsolete (?). – Quasimodo Dec 25 '21 at 19:30
  • @Quasimodo Interesting insight, thanks! However, we are not replacing levels here, but creating them, which might be different. – jay.sf Dec 25 '21 at 19:33