0

My question seems a pretty recurrent one in R (convert factors to numeric in dataframe). Still solutions are not straightforward.

What I want is to systematically recode ordered factor variables to ordered numeric variables.

I want to have your insight on a potential (simple) solution.

My data look like this

data = rbind(
  c('a1', 'strongly favor', 'bad', 'low'), 
  c('b1', 'neither nor', 'good', 'middle'), 
  c('c1', 'favor', 'good', 'low'), 
  c('d1', 'strongly oppose', 'good', 'high'), 
  c('e1', 'oppose', 'average', 'high') 
  )

data = as.data.frame(data)
data$V2 = factor(data$V2, levels = c('strongly favor', 'favor', 'neither nor', 'oppose', 'strongly oppose')) 
data$V3 = factor(data$V3, levels = c('good', 'average', 'bad')) 
data$V4 = factor(data$V4, levels = c('high', 'middle', 'low')) 

  V1              V2      V3     V4
1 a1  strongly favor     bad    low
2 b1     neither nor    good middle
3 c1           favor    good    low
4 d1 strongly oppose    good   high
5 e1          oppose average   high

I was thinking of a simple solution like this one :

levels(data$V2) <- 1:length(data$V2)

Avoiding doing every variables one by one, I was thinking of a little loop

# First column is the identifier 
for(i in 2:ncol(data)){
  levels(data[,i]) <- 1:length(data[,i])
}

Could this solution induced some errors ?
How could I avoid looping ?

Community
  • 1
  • 1
giac
  • 4,261
  • 5
  • 30
  • 59
  • 1
    `lapply(data[-1], function(x) as.numeric(x) )` would coerce the factor to numeric. But, I didn't understand the way you changed the levels to 1:length – akrun May 26 '15 at 11:08
  • After looking at your code and thinking a bit, wouldn't this approach gives extra numeric levels especially for V3 and V4? Also, based on the code, if the nrow is 1000 or so, this willl give 1000 levels though only few levels actually exist for each column – akrun May 26 '15 at 11:24

0 Answers0