66

What is the most efficient way to convert multiple columns in a data frame from character to numeric format?

I have a dataframe called DF with all character variables.

I would like to do something like

for (i in names(DF){
    DF$i <- as.numeric(DF$i)
}

Thank you

ec0n0micus
  • 1,075
  • 2
  • 12
  • 19

17 Answers17

113

You could try

DF <- data.frame("a" = as.character(0:5),
                 "b" = paste(0:5, ".1", sep = ""),
                 "c" = letters[1:6],
                 stringsAsFactors = FALSE)

# Check columns classes
sapply(DF, class)

#           a           b           c 
# "character" "character" "character" 

cols.num <- c("a","b")
DF[cols.num] <- sapply(DF[cols.num],as.numeric)
sapply(DF, class)

#          a           b           c 
#  "numeric"   "numeric" "character"
Luca Braglia
  • 3,133
  • 1
  • 16
  • 21
  • 5
    Error in `[.data.table`(data, nums) : When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM. – zsad512 Jan 28 '18 at 22:58
  • https://stackoverflow.com/questions/48448293/converting-different-columns-to-different-formats?noredirect=1#comment83933268_48448293 – zsad512 Jan 28 '18 at 22:58
68

If you're already using the tidyverse, there are a few solution depending on the exact situation.

Basic if you know it's all numbers and doesn't have NAs

library(dplyr)

# solution
dataset %>% mutate_if(is.character,as.numeric)

Test cases

df <- data.frame(
  x1 = c('1','2','3'),
  x2 = c('4','5','6'),
  x3 = c('1','a','x'), # vector with alpha characters
  x4 = c('1',NA,'6'), # numeric and NA
  x5 = c('1',NA,'x'), # alpha and NA
  stringsAsFactors = F)

# display starting structure
df %>% str()

Convert all character vectors to numeric (could fail if not numeric)

df %>%
  select(-x3) %>% # this removes the alpha column if all your character columns need converted to numeric
  mutate_if(is.character,as.numeric) %>%
  str()

Check if each column can be converted. This can be an anonymous function. It returns FALSE if there is a non-numeric or non-NA character somewhere. It also checks if it's a character vector to ignore factors. na.omit removes original NAs before creating "bad" NAs.

is_all_numeric <- function(x) {
  !any(is.na(suppressWarnings(as.numeric(na.omit(x))))) & is.character(x)
}
df %>% 
  mutate_if(is_all_numeric,as.numeric) %>%
  str()

If you want to convert specific named columns, then mutate_at is better.

df %>% mutate_at('x1', as.numeric) %>% str()
ARobertson
  • 2,857
  • 18
  • 24
  • I'm not sure how you can directly convert from character to numeric. You'd have to work with factors and then to numeric. Unless I'm missing something. – FilipeTeixeira Mar 26 '18 at 12:11
  • 3
    @FilipeTeixeira I believe you're thinking about converting from factors to numeric. You have to convert factors to characters to numeric, unless you truly want the numeric factor level, which in my experience has been rare. If you run the test code above, you can see that it works. It will fail if they aren't actually numbers though, but then you'll have to deal with that anyway. – ARobertson Mar 27 '18 at 15:34
  • this approach keeps your data.frame as a data.frame, while ```lapply``` converts your dataframe into a list – coding_is_fun Apr 22 '22 at 18:47
  • 1
    mutate_if() has been deprecated. Use across() now, eg mutate( across( where(is.character), ~ as.numeric(.x) ) ) – Ian May 06 '22 at 09:16
27

You can use index of columns: data_set[,1:9] <- sapply(dataset[,1:9],as.character)

Masimi
  • 271
  • 3
  • 3
23

I used this code to convert all columns to numeric except the first one:

    library(dplyr)
    # check structure, row and column number with: glimpse(df)
    # convert to numeric e.g. from 2nd column to 10th column
    df <- df %>% 
     mutate_at(c(2:10), as.numeric)
YodaM
  • 231
  • 2
  • 3
17

Using the across() function from dplyr 1.0

   df <- df %>% mutate(across(, ~as.numeric(.))
etrowbridge
  • 355
  • 3
  • 8
8

You could use convert from the hablar package:

library(dplyr)
library(hablar)

# Sample df (stolen from the solution by Luca Braglia)
df <- tibble("a" = as.character(0:5),
                 "b" = paste(0:5, ".1", sep = ""),
                 "c" = letters[1:6])

# insert variable names in num()
df %>% convert(num(a, b))

Which gives you:

# A tibble: 6 x 3
      a     b c    
  <dbl> <dbl> <chr>
1    0. 0.100 a    
2    1. 1.10  b    
3    2. 2.10  c    
4    3. 3.10  d    
5    4. 4.10  e    
6    5. 5.10  f   

Or if you are lazy, let retype() from hablar guess the right data type:

df %>% retype()

which gives you:

# A tibble: 6 x 3
      a     b c    
  <int> <dbl> <chr>
1     0 0.100 a    
2     1 1.10  b    
3     2 2.10  c    
4     3 3.10  d    
5     4 4.10  e    
6     5 5.10  f   
davsjob
  • 1,882
  • 15
  • 10
  • 1
    use of retype() from hablar library was cleaner and easier than all the other solutions here, and worked for my use case. Thanks! – Idiot Tom Oct 15 '20 at 09:42
  • 1
    retype() worked great! Used it on 3000+ columns that had NAs splattered about. – Falnésio Jan 09 '21 at 17:51
7

type.convert()

Convert a data object to logical, integer, numeric, complex, character or factor as appropriate.

Add the as.is argument type.convert(df,as.is = T) to prevent character vectors from becoming factors when there is a non-numeric in the data set.

See.

ARobertson
  • 2,857
  • 18
  • 24
Zuooo
  • 337
  • 2
  • 11
6

Slight adjustment to answers from ARobertson and Kenneth Wilson that worked for me.

Running R 3.6.0, with library(tidyverse) and library(dplyr) in my environment:

library(tidyverse)
library(dplyr)
> df %<>% mutate_if(is.character, as.numeric)
Error in df %<>% mutate_if(is.character, as.numeric) : 
  could not find function "%<>%"

I did some quick research and found this note in Hadley's "The tidyverse style guide".

The magrittr package provides the %<>% operator as a shortcut for modifying an object in place. Avoid this operator.

# Good x <- x %>%
           abs() %>%    
           sort()

# Bad x %<>%   
          abs() %>%
          sort()

Solution

Based on that style guide:

df_clean <- df %>% mutate_if(is.character, as.numeric)

Working example

> df_clean <- df %>% mutate_if(is.character, as.numeric)
Warning messages:
1: NAs introduced by coercion 
2: NAs introduced by coercion 
3: NAs introduced by coercion 
4: NAs introduced by coercion 
5: NAs introduced by coercion 
6: NAs introduced by coercion 
7: NAs introduced by coercion 
8: NAs introduced by coercion 
9: NAs introduced by coercion 
10: NAs introduced by coercion 
> df_clean
# A tibble: 3,599 x 17
   stack datetime            volume BQT90 DBT90 DRT90 DLT90 FBT90  RT90 HTML90 RFT90 RLPP90 RAT90 SRVR90 SSL90 TCP90 group
   <dbl> <dttm>               <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
PvR
  • 71
  • 1
  • 4
4

I think I figured it out. Here's what I did (perhaps not the most elegant solution - suggestions on how to imp[rove this are very much welcome)

#names of columns in data frame
cols <- names(DF)
# character variables
cols.char <- c("fx_code","date")
#numeric variables
cols.num <- cols[!cols %in% cols.char]

DF.char <- DF[cols.char]
DF.num <- as.data.frame(lapply(DF[cols.num],as.numeric))
DF2 <- cbind(DF.char, DF.num)
ec0n0micus
  • 1,075
  • 2
  • 12
  • 19
3

I realize this is an old thread but wanted to post a solution similar to your request for a function (just ran into the similar issue myself trying to format an entire table to percentage labels).

Assume you have a df with 5 character columns you want to convert. First, I create a table containing the names of the columns I want to manipulate:

col_to_convert <- data.frame(nrow = 1:5
                            ,col = c("col1","col2","col3","col4","col5"))

for (i in 1:max(cal_to_convert$row))
  {
    colname <- col_to_convert$col[i]
    colnum <- which(colnames(df) == colname)
        for (j in 1:nrow(df))
          {
           df[j,colnum] <- as.numericdf(df[j,colnum])
          }
  }

This is not ideal for large tables as it goes cell by cell, but it would get the job done.

Mark Wagner
  • 363
  • 2
  • 7
2

like this?

DF <- data.frame("a" = as.character(0:5),
             "b" = paste(0:5, ".1", sep = ""),
             "c" = paste(10:15),
             stringsAsFactors = FALSE)

DF <- apply(DF, 2, as.numeric)

If there are "real" characters in dataframe like 'a' 'b' 'c', i would recommend answer from davsjob.

1

Use data.table set function

setDT(DF)
for (j in YourColumns)
     set(DF, j=j, value = as.numeric(DF[[j]])

If you need to keep as data.frame then just use setDF(DF)

yuskam
  • 310
  • 3
  • 8
1

Try this to change numeric column to character:

df[,1:11] <- sapply(df[,1:11],as.character)
j__carlson
  • 1,346
  • 3
  • 12
  • 20
Rupesh Kumar
  • 157
  • 3
1
DF[,6:11] <- sapply(DF[,6:11], as.numeric)

or

DF[,6:11] <- sapply(DF[,6:11], as.character)
heilala
  • 770
  • 8
  • 19
  • 2
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 10 '22 at 07:52
0
for (i in 1:names(DF){
    DF[[i]] <- as.numeric(DF[[i]])
}

I solved this using double brackets [[]]

M_lix
  • 1
-1

Since we can index a data frame column by it's name, a simple change can be made:

for (i in names(DF)){ DF[i] <- as.data.frame(as.numeric(as.matrix(DF[i]))) }

  • 3
    Nick Ooster, do not vandalize your posts. By posting on this site, you've irrevocably granted the Stack Exchange network the right to distribute that content under the [CC BY-SA 4.0 license](//creativecommons.org/licenses/by-sa/4.0/) for as long as it sees fit to do so. For alternatives to deletion, see: [I've thought better of my question; can I delete it?](https://stackoverflow.com/help/what-to-do-instead-of-deleting-question) – Ethan Nov 13 '22 at 18:52
-2
A<- read.csv("Environment_Temperature_change_E_All_Data_NOFLAG.csv",header = F)

Now, convert to character

A<- type.convert(A,as.is=T)

Convert some columns to numeric from character

A[,c(1,3,5,c(8:66))]<- as.numeric(as.character(unlist(A[,c(1,3,5,c(8:66))])))