0

I'm using this procedure to convert categorical values to numeric values using levels and merge from reshape2 library. (just two columns shown for the sake of brevity)

data

    printerM  user

    RICOH     Pam
    CANON     Clara
    TOSHIBA   Joe
    RICOH     Fred
    CANON     Clark

printers.df <- data.frame(printers=unique(data$printerM))
numbers.df <- data.frame(numbers=1:length(unique(data$printerM))
printers.table <- as.data.frame(cbind(printers.df, numbers.df))
library(reshape2)
new.data<- merge(data, printers.table)
new.data$printers <- NULL

new.data

    printer  user   numbers

    RICOH     Pam      1
    CANON     Clara    2
    TOSHIBA   Joe      3
    RICOH     Fred     1      
    CANON     Clark    2

The issue is I got 34 columns and I'm not very happy of writing the same code 34 times, so I suppose this can be handled by:

1.- converting my code into a function 2.- using an existing R function

Not very versed on converting my R code into a function, and I don't know if this kind of transformation is available in any library.

Anyway, any hint will be much appreciated.

useRj
  • 1,232
  • 1
  • 9
  • 15
  • 2
    Are you just trying to create group ids per `printer`? Something like [this](http://stackoverflow.com/questions/13566562/creating-a-unique-id-in-r) maybe? – David Arenburg May 10 '16 at 13:12
  • Nor really grouping @David , just translating to numbers a categorical value. And avoiding to do it 34 times – useRj May 10 '16 at 13:25
  • Did you see the question I linked? – David Arenburg May 10 '16 at 13:25
  • Yes, my bad. I thought that a two dimensions interaction was needed. Tested with just one column and worked ok. Shorter to write this 34 times than my code. Thanks – useRj May 10 '16 at 13:45

1 Answers1

0

If you are applying this function to columns of a data frame you could make use of the fact that it is really a list underneath. For each column or list component, you want to convert to numeric if it is a factor and retain other columns as they were if I understand correctly. I will give a dummy example which does this:

df = data.frame(sample(letters[1:5],10,replace=TRUE),
                runif(10),
                sample(LETTERS[1:5],10,replace=TRUE),
                sample(letters[11:15],10,replace=TRUE))
colnames(df) = paste0("X",1:4)
data.frame(lapply(df, function(x) if(is.factor(x)) as.numeric(x) else x))

Edit:

Note this will change all columns that are factors as it is checking each column as to whether or not it is a factor, if it is then return that factor cast to a numeric, otherwise return the original column. It is possible to also keep the original factor with the new numeric encoding too, you could have list(x,as.numeric(x)) in place of the as.numeric(x) but by default column names will become a bit funny.

jamieRowen
  • 1,509
  • 9
  • 14