9

I want to convert an entire data.frame containing more than 130 columns to numeric.

I know that I need to use as.numeric, but the problem is that I have to apply this function separately to each one of the 130 columns. I tried to apply it to the entire data.frame, but I got the following error message:

Error: (list) object cannot be coerced to type 'double'

How can I do that by a relatively short code?

Stedy
  • 7,359
  • 14
  • 57
  • 77
MoMo
  • 321
  • 2
  • 3
  • 8

4 Answers4

14

In base R we can do :

df[] <- lapply(df, as.numeric)

or

df[cols_to_convert]  <- lapply(df[cols_to_convert], as.numeric)

Here's a benchmark of the solutions (ignoring the considerations about factors) :

DF <- data.frame(a = 1:10000, b = letters[1:10000],
                 c = seq(as.Date("2004-01-01"), by = "week", len = 10000),
                 stringsAsFactors = TRUE)
DF <- setNames(do.call(cbind,replicate(50,DF,simplify = F)),paste0("V",1:150))

dim(DF)
# [1] 10000   150

library(dplyr)
n1tk  <- function(x) data.frame(data.matrix(x))
mm    <- function(x) {x[] <- lapply(x,as.numeric); x}
akrun <- function(x) mutate_all(x, as.numeric)
mo    <- function(x)  {for(i in 1:150){ x[, i] <- as.numeric(x[, i])}}

microbenchmark::microbenchmark(
  akrun = akrun(DF),
  n1tk  = n1tk(DF),
  mo    = mo(DF),
  mm    = mm(DF)
)

# Unit: milliseconds
#   expr      min        lq       mean    median        uq      max neval
#  akrun 152.9837 177.48150 198.292412 190.38610 206.56800 432.2679   100
#   n1tk  10.8700  14.48015  22.632782  17.43660  21.68520  89.4694   100
#     mo   9.3512  11.41880  15.313889  14.71970  17.66530  37.6390   100
#     mm   4.8294   5.91975   8.906348   7.80095  10.11335  71.2647   100
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
9

An option with dplyr

library(dplyr)
df1 %>%
   mutate_all(as.numeric)

If the columns are factor class, convert to character and then to numeric

df1 %>%
    mutate_all(funs(as.numeric(as.character(.)))

Also, note that if there are no character elements in any of the cells, then use type.convert on a character column

df1 %>%
    mutate_all(funs(type.convert(as.character(.)))

If efficiency matters, one option is data.table

library(data.table)
DF1 <- copy(DF) # from other post
system.time({setDT(DF1)
    for(j in seq_along(DF1)) set(DF1, i = NULL, j=j, value = as.numeric(DF1[[j]]))
  })
#   user  system elapsed 
#  0.032   0.005   0.037 
akrun
  • 874,273
  • 37
  • 540
  • 662
  • why `type.convert` ? is it faster ? I routinely use `as.numeric(as.character(x))` as well but it seems the recommended way (for best efficiency) is `as.numeric(levels(f))[f]` , see https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information – moodymudskipper Oct 21 '18 at 00:47
  • 1
    I am aware of the `levels` route. But, I think `as.numeric(as.character` is easier to underestand – akrun Oct 21 '18 at 04:33
1

Convert a Data Frame to a Numeric Matrix

for example we have this dataframe:

DF <- data.frame(a = 1:3, b = letters[10:12],
                  c = seq(as.Date("2004-01-01"), by = "week", len = 3),
                  stringsAsFactors = TRUE)
> DF
  a b          c
1 1 j 2004-01-01
2 2 k 2004-01-08
3 3 l 2004-01-15

to convert to numeric and have as dataframe you can use:

DF2 <- data.frame(data.matrix(DF))
> DF2
  a b     c
1 1 1 12418
2 2 2 12425
3 3 3 12432

Note: you can slice the dataframe columns in need if you want specific columns with, for example: “DF[1:3]”

n1tk
  • 2,406
  • 2
  • 21
  • 35
0

Thank you n1tk, your solution works. I first tried to use this code:

for(i in 1:140){
  mydata[, i] <- as.numeric(mydata[, i])
}

But I think your solution is easier.

akrun, yes I am aware that we need to convert factors to character first and then to numeric.

MoMo
  • 321
  • 2
  • 3
  • 8