43

I have a data.frame whose class column is Factor. I'd like to convert it to numeric so that I can use correlation matrix.

> str(breast)
'data.frame':   699 obs. of  10 variables:
 ....
 $ class                   : Factor w/ 2 levels "2","4": 1 1 1 1 1 2 1 1 1 1 ...
> table(breast$class)
  2   4 
458 241
> cor(breast)
Error in cor(breast) : 'x' must be numeric

How can I convert a Factor column to a numeric column?

birdy
  • 9,286
  • 24
  • 107
  • 171

4 Answers4

112
breast$class <- as.numeric(as.character(breast$class))

If you have many columns to convert to numeric

indx <- sapply(breast, is.factor)
breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x)))

Another option is to use stringsAsFactors=FALSE while reading the file using read.table or read.csv

Just in case, other options to create/change columns

 breast[,'class'] <- as.numeric(as.character(breast[,'class']))

or

 breast <- transform(breast, class=as.numeric(as.character(breast)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • If the case includes multiple column, what does "function(x)" in breast[indx] <- lapply(breast[indx], function(x) as.numeric(as.character(x))) do? – Couch Tomato Aug 10 '21 at 16:17
  • 1
    @CouchTomato it is a lambda function or anonymous function ie. function created on the fly. Here, the 'x' is each of the column values from the subset of columns `breast[indx]` looped in `lapply`. `as.character` or `as.numeric` requires a input as vector and that is the reason we loop – akrun Aug 10 '21 at 17:14
15

From ?factor:

To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

BrodieG
  • 51,669
  • 9
  • 93
  • 146
8

This is FAQ 7.10. Others have shown how to apply this to a single column in a data frame, or to multiple columns in a data frame. But this is really treating the symptom, not curing the cause.

A better approach is to use the colClasses argument to read.table and related functions to tell R that the column should be numeric so that it never creates a factor and creates numeric. This will put in NA for any values that do not convert to numeric.

Another better option is to figure out why R does not recognize the column as numeric (usually a non numeric character somewhere in that column) and fix the original data so that it is read in properly without needing to create NAs.

Best is a combination of the last 2, make sure the data is correct before reading it in and specify colClasses so R does not need to guess (this can speed up reading as well).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
2

As an alternative to $dollarsign notation, use a within block:

breast <- within(breast, {
  class <- as.numeric(as.character(class))
})

Note that you want to convert your vector to a character before converting it to a numeric. Simply calling as.numeric(class) will not the ids corresponding to each factor level (1, 2) rather than the levels themselves.

Joe
  • 3,831
  • 4
  • 28
  • 44