2

I am getting very tired of writing as.numeric(as.character(my.factor)) if I want to get the numeric value of a factor in R. Although it works, it is not self-evident what the code does and it just feels plain wrong to convert numbers to character strings and back again to do anything with them. Is there a simpler and more self-explanatory way like factor.values(my.factor)?

It has been suggested to packing it away in a custom function like

factor.values = function(x) as.numeric(levels(x))[x]  # get the actual values of a factor with numeric labels

the problem with this solution is that it must be copy-pasted between scripts if it is to be reproducible by collaborators. I'm asking if there is a short built-in method to do it. I know this is a VERY small problem, but since it's a frequent one and many find the commonly presented solution anti-intuitive, I raise it anyway.

The problem

Fpr the unitiated, if you have a factor and want to do numeric operations on it you run into a number of problems:

   > my.factor = factor(c(1, 1, 2, 5, 8, 13, 21))
    > sum(my.factor)  # let's try a numeric operation
    Error in Summary.factor(1:6, na.rm = FALSE) : 
      sum not meaningful for factors
    > as.numeric(my.factor)  # oh, let's make it numeric then.
    [1] 1 1 2 3 4 5 6  # argh! levels numbers and not values
    > as.character(my.factor)  # because the web told me so.
    [1] "1"  "1"  "2"  "5"  "8"  "13" "21"  # closer...
    > as.numeric(as.character(my.factor))  # NOT short or self-explanatory!
    [1]  1  1  2  5  8 13 21  # finally we can sum ...
    > sum(as.numeric(as.character(my.factor)))
    [1] 51
Community
  • 1
  • 1
Jonas Lindeløv
  • 5,442
  • 6
  • 31
  • 54
  • 4
    In my experience you need this only if something went wrong with data import. The solution is usually to fix the import step. Numeric information should never be a factor to begin with. – Roland Dec 13 '14 at 14:02
  • True, but handy functions such as ``mapvalues`` make factors out of numeric data for no reason. So I often find myself using it anyway. – Jonas Lindeløv Dec 13 '14 at 15:15
  • 1
    Just write a simple wrapper function around the "ugly" code and be done with it. It's really not that big a deal to do yourself. – Joshua Ulrich Dec 13 '14 at 15:44
  • Can you provide an example of `mapvalues` "making factors out of numeric data for no reason"? – nicola Dec 13 '14 at 15:56
  • Sure, @nicola, I've updated the question with that example. Joshua, see my reply to the answer below. I still think that solution is messy. I may just have unrealistically high aesthetic goals for my R code. – Jonas Lindeløv Dec 13 '14 at 20:37
  • 2
    It's simpler (for me) to not use factors to begin with, but if you already have them, here's an alternative to plyr: `df.target$x <- with(df.source, setNames(x, id))[as.character(df.target$id)]`. Here's another: `m <- merge(df.source, df.target, by="id", sort=FALSE); m[order(m$id),]` – GSee Dec 14 '14 at 18:43
  • @JonasLindeløv I don't really think that the example makes your point. You gave to `mapvalues` a `factor` and you got a `factor` returned. It's totally expected. The "ugliness" comes from starting with the wrong form of data. Is there a reason why you allowed the `id` column to be a `factor`? – nicola Dec 15 '14 at 08:49
  • @nicola ``id`` is a factor because it is a grouping which should not be subject to numerical operations. So it's just a way to make sure to get a warning/error if that somehow happens unintentionally, but I guess it's not necessary if you write bullet-proof code right out of the box :-) But yes, I realize now that mapvalues of course should maintain ``x`` as a factor - obvious in the cases where there is a value in ``x`` which is not replaced. I've removed that example from the question. Thanks for pointing it out. – Jonas Lindeløv Dec 15 '14 at 10:49

1 Answers1

3

From ?factor

To transform a factor ‘f’ to approximately its original numeric values, ‘as.numeric(levels(f))[f]’ is recommended and slightly more efficient than ‘as.numeric(as.character(f))’.

GSee
  • 48,880
  • 13
  • 125
  • 145
  • Thanks for the info. But syntax-wise, it just becomes even more complex, especially when dealing with data.frames. Compare ``as.numeric(as.character(mydf$column))`` to ``as.numeric(levels(mydf$column))[mydf$column]`` – Jonas Lindeløv Dec 13 '14 at 15:34
  • 3
    Then put the complex syntax in a function. That's what functions are for. – Joshua Ulrich Dec 13 '14 at 15:46
  • It may be that I'm too used to python, i.e. that the code is actually beautiful. But I was hoping for something prettier than having a function which is copied/sourced in every script I write. Well, maybe my expectations were too high. – Jonas Lindeløv Dec 13 '14 at 20:09
  • @JonasLindeløv: You wouldn't have to copy/source that function in every script you write. That's what packages are for... or you could source it into an environment attached to your search path in your .Rprofile. Saying "the code is actually beautiful" is subjective. Code you find beautiful might be unappealing to others. I considered voting to close this as "primarily opinion based" for that very reason (i.e. which "pretty" version is "the best pretty version"?). – Joshua Ulrich Dec 14 '14 at 00:02
  • Point taken. I've updated the question to whether there is a shorter and more intuitive (less roundabout) solution rather than a "pretty" one, since that is what I meant. The answer seems to be "no" and I'd accept that answer. The problem with putting it in my own environment is that it raises problems when you share the script with others. I've updated the question with that request as well. – Jonas Lindeløv Dec 14 '14 at 08:34