1

I have leveled a factor variable in R. Once I did it, I am not able to see the original values behind each label. One example is given below:

library(Hmisc)
x <- as.factor(c("", "1", "2", "3", "4", "", "1", "2", "3", "4"))
x1 <- factor(x, levels = c("", "1","2", "3", "4"), labels = c("NA", "A", "B", "C", "D"))
table(x)
x
1 2 3 4 
2 2 2 2 2 

table(x1)
x1
NA  A  B  C  D 
2  2  2  2  2 

One factor variable x is created here and labeled in x1. From the table of x1, I don't get to see original values (0, 1, 2, 3, 4) of x1. To have a table of original values, I need to rely on table(x). Also, function table(as.integer(x1)) does not fetch original values from x1 since factor does not consider blank cells as 0. For few variables, creating similar factor variables is fine like here. But not possible when I work with large number of variables. Is there any function where I can see original values in a factor as well as factor labels from the same varibale?

Biswajit Kar
  • 75
  • 1
  • 9
  • I think the answer to this is, it is not possible to get original `levels` after `labels` are assigned. See this another question which seems to ask the same thing https://stackoverflow.com/questions/32950899/get-original-association-between-levels-and-labels-in-factor-variables – Ronak Shah Apr 07 '20 at 08:44
  • you could name the elements of your factor vector: `names(x1) <- x` or vice versa. That way you would keep both information. – Humpelstielzchen Apr 07 '20 at 08:55
  • I can't understand what you are trying to say. Could you please elaborate? – Biswajit Kar Apr 07 '20 at 09:11
  • It's possible if you assign an attribute to each variable, similar to how the `read.dta` function does this when importing Stata data files. It adds a "label.table" attribute to all factors. But you'll have to do this for each variable before factoring it - maybe using a for loop or `lapply`. – Edward Apr 07 '20 at 09:12
  • @Edward, thanks for the information. All my variables are already in factor by default and masked with factor labels. Could you please elaborate on your suggestions with a sample code. – Biswajit Kar Apr 07 '20 at 09:22
  • Then sadly, the answer is no - it's not possible to identify the previous numeric values corresponding to each factor level, unless the data was imported from Stata, SPSS, or other software that support factor labeling. I can give an example, but it won't solve your dilemma. – Edward Apr 07 '20 at 09:27
  • @Edward and Humpelstielzchen, thank you. – Biswajit Kar Apr 07 '20 at 09:43
  • Out of curiosity, why would you want to know the original values? Also, it may be possible to put probabilities on what they _may_ have been, based on external or seemingly irrelevant information available. – Edward Apr 07 '20 at 11:43
  • @Edward, Actually, I need to recode a few categories after labeling. For instance, recoding 'D' into 'E' or collapsing 'A' and 'B' into 'F'. In this example, it is easy since we have only one alphabet but sometimes, labels are short descriptions and become problematic for recording. Also checking the data whether labeling has been done in right way or not. – Biswajit Kar Apr 07 '20 at 12:12
  • I bumped into the same "problem" myself. I guess the only way is to recode everything, if that is possible, and store the "original" levels as an attribute. E.g. `my_vector <- c(0, 1, 1, 0, 1)`, then: `my_factor <- factor(my_vector, levels = c(0, 1), labels = c("No", "Yes"))`, then: `attr(my_factor, "original_levels") <- c(0, 1)`, and then finally access the "original" levels by `attr(my_factor, "original_levels")`. – jaggedjava May 05 '23 at 18:34

0 Answers0