1

First of all: I am not sure if this is the correct approach for my scenario.

There is a questionairy with 4 possible checkmarks ("A", "B", "C", "D") coded in the original data from 0 to 3.. Imagine they are as age groups (under 30, 31 to 50, ...). No I want to have something like value lables (like SPSS offers) to use them later in summary tables or plots. In my understanding R offers the levels for this.

The problem now is that currently the "A" (coded as 0) currently not exist in the data. But this can change in the future because the data is not fixed yet.

How can I set a level (SPSS meaning: value label) for a value (0 in this example) that currently not exist in the data?

> set.seed(100)
> s <- sample(c(seq(1,3), NA), 10, replace=TRUE)
> f <- factor(s)
> f
 [1] 2    2    3    1    2    2    <NA> 2    3    1   
Levels: 1 2 3
> levels(f) <- c("A", # = 0
+                "B", # = 1
+                "C", # = 2
+                "D") # = 3
> f
 [1] B    B    C    A    B    B    <NA> B    C    A   
Levels: A B C D
buhtz
  • 10,774
  • 18
  • 76
  • 149
  • 3
    see last example in `?levels` – eddi May 01 '18 at 21:15
  • @eddi I don't understand that example. – buhtz May 01 '18 at 21:23
  • 2
    What's there to not understand - it's the exact problem you're trying to solve... `levels(f) = list(A = 0, B = 1, C = 2, D = 3)` – eddi May 01 '18 at 21:24
  • 1
    Try `f <- factor(s, levels = 0:3, labels = c("A", "B", "C", "D"))` – zx8754 May 01 '18 at 21:29
  • 2
    Relavant reading: [Confusion between factor levels and factor labels](https://stackoverflow.com/questions/5869539/confusion-between-factor-levels-and-factor-labels); [Why is the terminology of labels and levels in factors so weird?](https://stackoverflow.com/questions/7128413/why-is-the-terminology-of-labels-and-levels-in-factors-so-weird) – Henrik May 01 '18 at 21:56

1 Answers1

2

You have two difficulties. The numbering of factor-values starts with 1 as does almost all indexing in R and assignment to a non-existent level is not accepted so you could build in a "D" level at the time of the factor creations and then assignment to NA values could succeed:

 set.seed(100)
 s <- sample(c(seq(1,3), NA), 10, replace=TRUE)
 (f <- factor(s, levels=1:4,labels=LETTERS[1:4]))
# [1] B    B    C    A    B    B    <NA> B    C    A   
#Levels: A B C D
 f[ is.na(f) ] <- "D"
 f
 #[1] B B C A B B D B C A
#Levels: A B C D

I find working with character vectors to be far easier and suggest adopting a policy of using stringsAsFactors=FALSE for all your read.* operations.

IRTFM
  • 258,963
  • 21
  • 364
  • 487