4

I am working with a big dataset that is causing some trouble because some of the columns I the dataset are being treated as factors. How can I convert all of the columns from factor to numeric, without having to do that column by column??

I have tried to apply a small loop, but it returns NA values. Here's a sample data that applies to the case:

data <- structure(list(v1 = c(22.394, 43.72, 58.544, 56.877, 1.659, 29.142, 
67.836, 68.851), v2 = c(144.373, 72.3, 119.418, 112.429, 35.779, 
41.661, 166.941, 126.548), v3 = structure(c(33L, 29L, 33L, 5L, 
13L, 31L, 5L, 8L), .Label = c("", "#VALUE!", "0", "1", "10", 
"11", "12", "13", "14", "15", "16", "17", "18", "19", "2", "20", 
"21", "22", "23", "24", "25", "26", "28", "29", "3", "30", "32", 
"33", "4", "48", "5", "6", "7", "8", "9"), class = "factor"), 
    v4 = structure(c(24L, 6L, 22L, 23L, 16L, 22L, 23L, 26L), .Label = c("", 
    "-1", "-2", "-4", "#VALUE!", "0", "1", "10", "11", "12", 
    "13", "14", "15", "16", "17", "18", "19", "2", "24", "28", 
    "29", "3", "4", "5", "6", "7", "8", "9"), class = "factor")), .Names = c("v1", 
"v2", "v3", "v4"), row.names = c("4", "5", "6", "7", "8", "9", 
"10", "11"), class = "data.frame")

for (i in 1:ncol(data)){
data[,i] <- as.numeric(as.character(data[i]))
} ## returns NAs

Is there some command that I can apply to turn all these columns into a numeric class?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Error404
  • 6,959
  • 16
  • 45
  • 58
  • 1
    Your loop doesn't work because you have `data[i]` instead of `data[,i]` at the end. – ping May 28 '14 at 15:05
  • Or use "[[", which would return rhw factor vector rather than the factor inside a list. – IRTFM May 28 '14 at 15:07

4 Answers4

17

This works but I'm thinking your data has an odd character or space, something that makes it read in as factor. You can try reading in with the argument stringsAsFactors = FALSE. But still wouldn't address character vs numeric read in. Here's a fix:

data[] <- lapply(data, function(x) as.numeric(as.character(x)))

## > str(data)
## 'data.frame':   8 obs. of  4 variables:
##  $ v1: num  22.39 43.72 58.54 56.88 1.66 ...
##  $ v2: num  144.4 72.3 119.4 112.4 35.8 ...
##  $ v3: num  7 4 7 10 18 5 10 13
##  $ v4: num  5 0 3 4 18 3 4 7
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • 1
    note that if you want your output to be a dataframe and not a list, just do `do.call(cbind, data)` (or `rbind` depending on your data structure) – ale19 Aug 24 '17 at 14:20
  • @ale19 my response does give a data.frame...try the actual code and see – Tyler Rinker Aug 24 '17 at 20:06
  • 1
    @TylerRinker No it gave a list. I can independently verify it. – ABCD Sep 20 '18 at 10:06
  • @SmallChess Can you str on toyr data and verify that you had a data.frame to start with and what the types of each column are. It should return a data.frame. – Tyler Rinker Sep 20 '18 at 14:35
  • I ran str(df) on my data. Confirmed is data.frame. Running the lapply(...) above converts it to a list. @ale19 and SmallChess are correct. However, ale19's suggestion doesn't work for me (perhaps I'm implementing incorrectly). – Bradford Nov 29 '21 at 13:43
  • Using as.data.frame() on the output of lapply() sorts this out, I think. @SmallChess – Bradford Nov 29 '21 at 13:49
5

You may be trying to solve the wrong problem, or solve the problem at the wrong place. Often the reason that a column that you think is numeric is read in as a factor is because there are characters where numbers should be in the original data. Converting these to numbers will result in a missing value instead of the intended number (which is better than the wrong number). It may be best to fix the original source of the data so that it is read in correctly.

The next option is to use the colClasses argument to read.table and related functions to specify that the columns should be numeric and the conversion will take place automatically. This can even be used (with a couple more steps) to convert "numbers" with "$", "%", or "," in them somewhere.

If these don't work for you and you want to convert the existing data frame then here is one approach:

w <- which( sapply( mydf, class ) == 'factor' )
mydf[w] <- lapply( mydf[w], function(x) as.numeric(as.character(x)) )
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
2

I accomplish this by simply writing the data frame and reading it back specifiying all columns are numeric. I use data.table package, but it applies to basic read/write functions as well.

library(data.table)
fwrite(dfm,"some.name.temp")
dfm <- fread("some.name.temp",colClasses="numeric")
Veera
  • 861
  • 1
  • 9
  • 18
1

#VALUE! seems to be the odd character; if so, telling R that this should be treated as missing by using the na.string argument is probably the way to go.

read.table(..., na.string="#VALUE!")
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142