1

Dear Stackoverflow community,

I have looked everywhere but can't find the answer to this question. I am trying to access the factor lookup table that R uses when you change a string vector into a factor vector. I am not trying to convert a string to a factor but rather to get the lookup table underlying the factor variable and store it as a hash table for use elsewhere.

I encountered the problem because I want to use this factor lookup table on a list of different length vectors, to convert them from strings to numbers.

i.e., I have a list of item sets that I want to convert to numeric, but each set in the list has a different number of items.

So far, I have converted the list of vectors into a vector

vec <- unlist(list)
vec <- factor(vec)

Now I want to do a lookup on the original list with the factor lookup table which must be underlying vec, but I can't seem to find it.

divibisan
  • 11,659
  • 11
  • 40
  • 58
Allen Wang
  • 2,426
  • 2
  • 24
  • 48
  • 1
    It is very unclear to me what you are asking. A [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) would most certainly help. Are you just asking how to go from the underlying numerical index to the character label for a given factor level? The `levels()` of a factor determine numeric assignment. The first level is assigned 1, the second is assigned 2, etc. – MrFlick Jan 31 '15 at 01:47
  • 2
    The "factor lookup table" for a factor vector is returned by `levels(fac.vec)`. – IRTFM Jan 31 '15 at 04:39
  • 1
    @MrFlick What I wanted was something like what Jthorpe wrote on the bottom, basically I know that if I make a vector like: vec <- c('a','b','c','e','a','b') vec <- factor(vec) levels(vec) I know that there is an internal hash table relating the symbols to integers: a : 1 b : 2 c : 3 e : 4 I just wanted to know if there was an easy way to access this internal hash table for use elsewhere, but it looks like you just have to create your own has table. – Allen Wang Jan 31 '15 at 19:16
  • ++ The question is good now it has been clarified. Jthorpe's answer is fine. – smci Jan 31 '15 at 19:51

1 Answers1

4

I think you either want the indexes which map the elements of the factor to elements of the factor levels, as in:

vec <- c('a','b','c','b','a')
f <- factor(vec)
f
#> [1] a b c b a
#> Levels: a b c

indx <- (f)
attributes(indx) <- NULL
indx
#> [1] 1 2 3 2 1

or you want the hash tables used internally to create the factor variable. Unfortunately, any hash tables created in the process of creating a factor, would be created by the functions unique and match which are internal functions, so you won't have access to anything those functions create (other than the return value of course). If you want a hash table so you can use it to index a character vector with the same levels as your existing factor, just create a hash table, as in:

library(hash)
.levels <- levels(f)
h <- hash(keys = .levels,values = seq_along(.levels))
newVec <- sample(.levels,10,replace=T)
newVec
#> [1] "a" "b" "a" "a" "a" "c" "c" "b" "c" "a"
values(h,keys = newVec)
#> a b a a a c c b c a 
#> 1 2 1 1 1 3 3 2 3 1 
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
  • 1
    Thanks Jthorpe that example solved my problem. Yes, I guess my question was whether you could somehow return the internal hash table used to generate the factors, and use it in another place, but this works perfectly. – Allen Wang Jan 31 '15 at 19:12
  • It’s worth noting that there’s probably a better way in R than to manually create a hash table for lookup. Those R functions where it makes sense to do so create hash lookup tables on the fly. If you properly vectorise your operations (rather than calling them in a loop, requiring over and over recreation of the lookup table), this is very efficient. – Konrad Rudolph Jan 31 '15 at 20:01
  • `hash` is neat. I was looking at `bit64` but that's overkill. Although `hash()` seems to just be doing nearly the same as `new.env(hash=TRUE)` – Rich Scriven Jan 31 '15 at 20:03