0

Let's say I have a data.frame that looks like this:

df.test <- data.frame(1:26, 1:26)
colnames(df.test) <- c("a","b")

and I apply a factor:

df.test$a <- factor(df.test$a, levels=c(1:26), labels=letters)

Now, how I would like to convert it back the integer codes:

as.numeric(df.test[1])## replies with an error code.

But this works:

as.numeric(df.test$a)

Why is that?

Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
  • 3
    You need to level-up your search skills. ;-) Here are two answers I found via searching for "[r] factor". [One](http://stackoverflow.com/questions/4798343/convert-factor-to-integer). [Two](http://stackoverflow.com/questions/3418128/r-how-to-convert-a-factor-to-an-integer-numeric-in-r-without-a-loss-of-informat). – Joshua Ulrich Jan 31 '11 at 18:26
  • 1
    As the links show there are two options. Supposing f contains the factors you want as numeric you could do as.numeric(as.character(f)) but this has flaws. The way that is recommended is as.numeric(levels(f))[f]. – Dason Jan 31 '11 at 19:07
  • As I was typing this out, I must have smacked enter, because I found the solution on my own. But, it still leaves me with a question. Which is why the above is edited. – Brandon Bertelsen Jan 31 '11 at 19:16
  • @Joshua, apparently I was using every combination of convert/integer/code but the right one. – Brandon Bertelsen Jan 31 '11 at 19:17
  • 3
    Firstly, `as.numeric` doesn't convert to integer, but to double. Secondly `as.numeric(df.test[1])` is not working because `df.test[1]` is `data.frame`. Either use `df.test[, 1]`, `df.test[, "a"]` or `df.test$a`. – aL3xa Jan 31 '11 at 19:19

4 Answers4

3

Actually Joshua's link are not applicable here because the task is not coverting from a factor with levels that have numeric interpretation. Your original effort that produced an error was almost correct. It was missing only a comma before the 1:

df.test <- data.frame(1:26, 1:26)
colnames(df.test) <- c("a","b")
df.test$a <- factor(df.test$a, levels=c(1:26), labels=letters)
as.numeric(df.test[,1])
# [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
# [19] 19 20 21 22 23 24 25 26

Or you could have used "[["

> as.numeric(df.test[[1]])
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26
IRTFM
  • 258,963
  • 21
  • 364
  • 487
2

as.numeric will convert a factor to numeric:

as.numeric(df.test$a)
ncray
  • 1,050
  • 9
  • 4
  • 2
    That only happens to work in this case. It is not a general solution... see Joshua Ulrich's comment. – John Jan 31 '11 at 18:59
1

To respond to your edit: Keep in mind that a factor has two parts: 1) the labels, and 2) the underlying integer codes. The two answers I linked to in my comment were to convert the labels to numeric. If you just want to get the internal codes, use as.integer(df.test$a) as demonstrated in the examples section of ?factor. aL3xa answered your question about why as.numeric(df.test[1]) throws an error.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
1

Accessing a column by name gives you a factor vector, which can be converted to numeric. However, a data frame is a list (of columns), and when you use the single bracket operator and a single number on a list, you get a list of length one. The same applies for data frames, so df.test[1] gets you column one as a new data frame, which cannot be coerced by as.numeric(). I did not know this!

> str(df.test$a)
 Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
> str(df.test[1])
'data.frame':   26 obs. of  1 variable:
 $ a: Factor w/ 26 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
J. Win.
  • 6,662
  • 7
  • 34
  • 52
  • Thanks Jon, that one was the best explanation of my question. Appreciated. – Brandon Bertelsen Feb 01 '11 at 03:18
  • This seems similar to your question of July 30 last year. My sympathies - for me some of these things are only beginning to make sense even after two years using R... :) – J. Win. Feb 01 '11 at 03:25