2

I want to convert a character vector in R to a factor (let's take the example from the DataCamp Introduction to R course) and would like to label a few of the factor levels. How do I avoid, that any unmentioned/undeclared levels are automatically put to NA?

speed_vector <- c("fast", "slow", "slow", "fast", "insane")

factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "insane"), labels = c("Speed < 30 mph", "Speed > 100 mph"))

results in

> summary(factor_speed_vector)
 Speed < 30 mph Speed > 100 mph            NA's 
              2               1               2 
> factor_speed_vector
[1] <NA>            Speed < 30 mph  Speed < 30 mph  <NA>            Speed > 100 mph
Levels: Speed < 30 mph < Speed > 100 mph

How can I make sure that any undefined factor level (like "fast" in this example) gets carried over with the original value instead of being set to NA?

Edit: My previous comment here, was due to a confusion of the level and labels option in the factor function. Anybody, also not knowing the difference can read up here: Confusion between factor levels and factor labels

emes
  • 23
  • 1
  • 6
  • what do you want to do with a value that is not a level of your factor though? Add it as a new factor? – acylam Aug 30 '17 at 14:59
  • @useR I just want to make sure that when converting vectors to factors in a larger dataset with a larger number of factor levels that data is not automatically set to NA, just because I missed to declare a level. So mt1022, it is actually irrelevant what the labels are, I just added them, so the first comment would not be why I don't use `factor()` without defining levels at all. The reason to define levels is that I want to give labels to them. – emes Aug 30 '17 at 15:03
  • Maybe you want to start with `factor_speed_vector <- factor(c("fast", "slow", "slow", "fast", "insane"), ordered = TRUE)` to better replicate your situation? – lmo Aug 30 '17 at 15:09
  • So are you asking for a way to convert a vector to a factor with levels automatically set to all unique values in the vector? Just use `as.factor()` – acylam Aug 30 '17 at 15:12
  • @useR; that doesn't work because, you can't order and label the levels in as.factor – don joe Aug 30 '17 at 15:17
  • Maybe relevant: https://stackoverflow.com/questions/5869539/confusion-between-factor-levels-and-factor-labels – Aurèle Aug 30 '17 at 15:18
  • @useR: No, I am asking to convert some of the unique values as declared in the `levels` option and not to omit the remaining unique values, but treat them instead as `as.factor()` would do. – emes Aug 30 '17 at 15:19
  • Do you necessarily want your resulting factor to be ordered? If yes, what should be the relative order of known (here `"slow"` and `"insane"`) and "unknown" factors (here `"fast"`)? If no, consider `forcats::fct_recode( as.factor(speed_vector), "Speed < 30 mph" = "slow", "Speed > 100 mph" = "insane" )` (not better than already posted answers, but more readable, at least to me) – Aurèle Aug 30 '17 at 15:28
  • @Aurèle: Thanks for the Link, I think part of my confusion originates from not fully understanding factor levels and labels – emes Aug 30 '17 at 15:47

4 Answers4

2

The forcats package has some nice helper functions to deal with factors. The fct_recode() function lets you change factor levels by hand. You can specify a sequence of named character vectors where the name gives the new level, and the value gives the old level. Levels not otherwise mentioned will be left as is. (from ?fct_recode, emphasis mine).

speed_vector <- c("fast", "slow", "slow", "fast", "insane")
speed_vector
[1] "fast"   "slow"   "slow"   "fast"   "insane"
forcats::fct_recode(speed_vector, "Speed < 30 mph" = "slow", "Speed > 100 mph" = "insane")
[1] fast            Speed < 30 mph  Speed < 30 mph  fast            Speed > 100 mph
Levels: fast Speed > 100 mph Speed < 30 mph
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • Thanks for this answer. This function seems really convenient and solves the problem just as fine. I accepted another answer, because it only uses base R. – emes Aug 31 '17 at 14:33
1

Would this suit you ?

speed_vector <- c("fast", "slow", "slow", "fast", "insane")
factor_speed_vector <- factor(speed_vector)
levels(factor_speed_vector)[factor_speed_vector == "slow"]   <- "Speed < 30 mph"
levels(factor_speed_vector)[factor_speed_vector == "insane"] <-  "Speed > 100 mph"
factor_speed_vector
# [1] fast            Speed < 30 mph  Speed < 30 mph  fast            Speed > 100 mph
# Levels: fast Speed > 100 mph Speed < 30 mph
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • Thanks for this answer. I like the readability of the code. I accepted another answer, because it would require less code typing in case of use with more factor levels. – emes Aug 31 '17 at 14:35
1

Using levels and match, you can do the following.

Start with a factor variable:

factor_speed_vector <- factor(c("fast", "slow", "slow", "fast", "insane"), ordered = TRUE)

Then, change the levels of the variable pulling the proper indices with match

levels(factor_speed_vector)[match(c("slow", "insane"), levels(factor_speed_vector))] <-
c("Speed < 30 mph", "Speed > 100 mph")

Here, match(c("slow", "insane"), levels(factor_speed_vector)) finds the indices for the factor levels matching "slow" and "insane". These indices are used to subset the levels and then the new labels are fed in.

lmo
  • 37,904
  • 9
  • 56
  • 69
0
factor_speed_vector = as.factor(speed_vector)

# > levels(factor_speed_vector)
# [1] "fast"   "insane" "slow"

levels(factor_speed_vector)[3:2] = c("Speed < 30 mph", "Speed > 100 mph")

# > factor_speed_vector
# [1] fast            Speed < 30 mph  Speed < 30 mph  fast            Speed > 100 mph
# Levels: fast Speed > 100 mph Speed < 30 mph
acylam
  • 18,231
  • 5
  • 36
  • 45
  • Thanks @useR. I think compared to the other solutions, this one is not so flexible, since it needs explicit addressing of the position in the label vector. – emes Aug 30 '17 at 16:05
  • @m.a.m I agree, but this is more convenient if your factor levels don't change. Feel free to accept the other answers. – acylam Aug 30 '17 at 16:08