3

I'm looking for a function to get the original mapping table of factor variable. I import an Rdata file. I've got a factor variable named "FactVar". I know the mapping table for "FactVar" is as follows:

"010025" -> city1
"015146" -> city2
"048017" -> city3
"082053" -> city4

In my dataframe the "FactVar" data are as follows (first 5 cases):

1: city1
2: city3
3: city4
4: city1
5: city3

So, no "city2" in my df. Which function can I use to get the original mapping table? is it available in my Rdata file?

Thank you

EDIT: I try to clarify my question with a better example. I have a survey question with the following possible answers:

1: "Yes"
2: "No"
8: "Don't Know"
9: "Not Applicable"

I create a factor variable "FactVar":

Var <- c(1,2,1,2,2,2,1,8,1,2)
FactVar <- factor(Var, levels=c(1,2,8,9), labels=c("Yes", "No", "Don't Know", "Not Applicable")

As you see, in my Rdata file I've got a factor variable where no data are linked to the level "Not Applicable". How can I get the original mapping table as in my survey question?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Scido
  • 63
  • 5

5 Answers5

3

The answer, I think, is "no." I don't have any explicit information to back this up, but even poring over the documentation for factor and related functions I don't see any way to recover the original levels, unless you store them separately (e.g. as an attribute, or saving the original function call) when the factor is created.

Frankly I think this is somewhat of an oversight in the design of the program, and while it's definitely somewhat of an edge case (I've never thought about it before), I'm going to put a bounty on this question and hope that it gets the attention of Dirk Eddelbuettel or one of the other R gurus.

Edit: I don't see the "add bounty" button. Maybe it'll show up in a few days (and hopefully I remember).

shadowtalker
  • 12,529
  • 3
  • 53
  • 96
  • I'm interested in the answer as well. It seems that, for example, if you want to turn a variable into a labelled factor AND keep the original data, you have to create a new variable. – teppo Dec 22 '21 at 19:38
1

I had this question before, which was answered here: How to access actual internal factor lookup hashtable in R

Sorry I don't have enough reputation to put this in comments.

Community
  • 1
  • 1
Allen Wang
  • 2,426
  • 2
  • 24
  • 48
  • Sorry @Allen, but it doesn't resolve my question. I tried with my example: `.levels <- levels(FactVar)`, `h <- hash(keys = .levels,values = seq_along(.levels))` and the result is:` containing 4 key-value pair(s). Don't Know : 3 No : 2 Not Applicable : 4 Yes : 1` I think @ssdecontrol is right, so far there's no way the get the original mapping table... – Scido Oct 06 '15 at 10:23
0
str(FactVar)

will give you back the mapping between the levels and their labels like this:

FactVar <- factor(Var, levels=c(1,2,8,9), labels=c("Yes", "No", "Don't Know", "Not Applicable

and will include the labels and levels of unused factors.

Greg Thatcher
  • 1,303
  • 20
  • 29
  • 1
    Sorry @Greg but I got something different: `Factor w/ 4 levels "Yes","No","Don't Know",..: 1 2 1 2 2 2 1 3 1 2`. Maybe should I add some attributes? Thanks – Scido Oct 15 '15 at 10:13
0

I have a slightly different problem, but based on, for example, shadowtalker's answer, I think the answer is the same: you cannot get the association.

I'm interested in turning a variable into a factor AND keeping the original data. It seems that I have to create a new variable and keep both.

The Factors help page in R documentation states that

To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

For example:

> v <- c( 0, 0, 3, 0, 6, 6 )
> 
> f1 <- factor( x = v, levels = c( 0, 3, 6, 9 ) )
> 
> as.numeric( levels( f1 ) )[f1]
[1] 0 0 3 0 6 6
>
> as.numeric( as.character( f1 ) )
[1] 0 0 3 0 6 6

However, if the factor is labelled, neither of the above methods work:

> f2 <- factor( x = v, levels = c( 0, 3, 6, 9 ), labels = c( "a", "b", "c", "d" ) )
> 
> as.numeric( levels( f2 ) )[f2]
Warning: NAs introduced by coercion
 [1] NA NA NA NA NA NA NA NA NA NA
>
> as.numeric( as.character( f2 ) )
Warning: NAs introduced by coercion
 [1] NA NA NA NA NA NA NA NA NA NA

This is obvious if we look at what levels() and as.character() give:

> levels( f2 )
[1] "a" "b" "c" "d"
>
> as.numeric( levels( f2 ) )
Warning: NAs introduced by coercion
[1] NA NA NA NA
>
> as.character( f2 )
[1] "a" "a" "b" "a" "c" "c"

If we just use as.numeric(), we get the new level values created by factor():

> as.numeric( f2 )
[1] 1 1 2 1 3 3
teppo
  • 542
  • 8
  • 11
-1

Not sure I understand what you mean. You can specify the labels for the levels of your factors.

df$FactVar <- factor(df$FactVar, levels=c(paste0("city", 1:4))) # assuming you go up to 'city4'

The point being is that you can specify the levels in any order you want using the levels parameter in the function factor

pedrosaurio
  • 4,708
  • 11
  • 39
  • 53