Get original association between levels and labels in factor variables

Question

I'm looking for a function to get the original mapping table of factor variable. I import an Rdata file. I've got a factor variable named "FactVar". I know the mapping table for "FactVar" is as follows:

"010025" -> city1
"015146" -> city2
"048017" -> city3
"082053" -> city4

In my dataframe the "FactVar" data are as follows (first 5 cases):

1: city1
2: city3
3: city4
4: city1
5: city3

So, no "city2" in my df. Which function can I use to get the original mapping table? is it available in my Rdata file?

Thank you

EDIT: I try to clarify my question with a better example. I have a survey question with the following possible answers:

1: "Yes"
2: "No"
8: "Don't Know"
9: "Not Applicable"

I create a factor variable "FactVar":

Var <- c(1,2,1,2,2,2,1,8,1,2)
FactVar <- factor(Var, levels=c(1,2,8,9), labels=c("Yes", "No", "Don't Know", "Not Applicable")

As you see, in my Rdata file I've got a factor variable where no data are linked to the level "Not Applicable". How can I get the original mapping table as in my survey question?

can you dput your input? What the name of your dataframe? you say you have a factor but apparently you defined 4 variable containing strings? sorry it's not clear at all! — Colonel Beauvel, Oct 05 '15 at 14:27
@Scido Apologies, I meant `as.numeric` instead of `levels`. I got confused because your factors look like strings rather than numeric variables — what’s going on there? — Konrad Rudolph, Oct 05 '15 at 14:29

score 3 · Answer 1 · answered Oct 05 '15 at 15:23

The answer, I think, is "no." I don't have any explicit information to back this up, but even poring over the documentation for factor and related functions I don't see any way to recover the original levels, unless you store them separately (e.g. as an attribute, or saving the original function call) when the factor is created.

Frankly I think this is somewhat of an oversight in the design of the program, and while it's definitely somewhat of an edge case (I've never thought about it before), I'm going to put a bounty on this question and hope that it gets the attention of Dirk Eddelbuettel or one of the other R gurus.

Edit: I don't see the "add bounty" button. Maybe it'll show up in a few days (and hopefully I remember).

I'm interested in the answer as well. It seems that, for example, if you want to turn a variable into a labelled factor AND keep the original data, you have to create a new variable. — teppo, Dec 22 '21 at 19:38

score 1 · Answer 2 · edited May 23 '17 at 12:29

1

I had this question before, which was answered here: How to access actual internal factor lookup hashtable in R

Sorry I don't have enough reputation to put this in comments.

edited May 23 '17 at 12:29

Community

1
1

answered Oct 05 '15 at 17:27

Allen Wang

2,426
2
24
48

Sorry @Allen, but it doesn't resolve my question. I tried with my example: `.levels <- levels(FactVar)`, `h <- hash(keys = .levels,values = seq_along(.levels))` and the result is:` containing 4 key-value pair(s). Don't Know : 3 No : 2 Not Applicable : 4 Yes : 1` I think @ssdecontrol is right, so far there's no way the get the original mapping table... – Scido Oct 06 '15 at 10:23

score 0 · Answer 3 · answered Oct 14 '15 at 18:15

0

str(FactVar)

will give you back the mapping between the levels and their labels like this:

FactVar <- factor(Var, levels=c(1,2,8,9), labels=c("Yes", "No", "Don't Know", "Not Applicable

and will include the labels and levels of unused factors.

answered Oct 14 '15 at 18:15

Greg Thatcher

1,303
20
29

1

Sorry @Greg but I got something different: `Factor w/ 4 levels "Yes","No","Don't Know",..: 1 2 1 2 2 2 1 3 1 2`. Maybe should I add some attributes? Thanks – Scido Oct 15 '15 at 10:13

score 0 · Answer 4 · answered Dec 22 '21 at 20:24

I have a slightly different problem, but based on, for example, shadowtalker's answer, I think the answer is the same: you cannot get the association.

I'm interested in turning a variable into a factor AND keeping the original data. It seems that I have to create a new variable and keep both.

The Factors help page in R documentation states that

To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

For example:

> v <- c( 0, 0, 3, 0, 6, 6 )
> 
> f1 <- factor( x = v, levels = c( 0, 3, 6, 9 ) )
> 
> as.numeric( levels( f1 ) )[f1]
[1] 0 0 3 0 6 6
>
> as.numeric( as.character( f1 ) )
[1] 0 0 3 0 6 6

However, if the factor is labelled, neither of the above methods work:

> f2 <- factor( x = v, levels = c( 0, 3, 6, 9 ), labels = c( "a", "b", "c", "d" ) )
> 
> as.numeric( levels( f2 ) )[f2]
Warning: NAs introduced by coercion
 [1] NA NA NA NA NA NA NA NA NA NA
>
> as.numeric( as.character( f2 ) )
Warning: NAs introduced by coercion
 [1] NA NA NA NA NA NA NA NA NA NA

This is obvious if we look at what levels() and as.character() give:

> levels( f2 )
[1] "a" "b" "c" "d"
>
> as.numeric( levels( f2 ) )
Warning: NAs introduced by coercion
[1] NA NA NA NA
>
> as.character( f2 )
[1] "a" "a" "b" "a" "c" "c"

If we just use as.numeric(), we get the new level values created by factor():

> as.numeric( f2 )
[1] 1 1 2 1 3 3

score -1 · Answer 5 · answered Oct 05 '15 at 14:28

Not sure I understand what you mean. You can specify the labels for the levels of your factors.

df$FactVar <- factor(df$FactVar, levels=c(paste0("city", 1:4))) # assuming you go up to 'city4'

The point being is that you can specify the levels in any order you want using the levels parameter in the function factor

Get original association between levels and labels in factor variables

5 Answers5

Linked