1

I have a dataframe with 51 observations and 43 variables, all of which are as characters. I want to coerce the columns 3:43 as numeric. How do I coerce these into numeric without touching the first two columns?

I attached an example dataframe that resembles mine, but it's NOT my true dataframe (I'm not the only owner of the data so I can't legally share it). Imagine the first two columns are actually characters, not doubles, and this will give you a good picture. I apologize for any inconvenience.

df <- structure(list(`Analyte  Sample` = c(1, 2, 3, 4, 5, 6), A = c("4190", "6665", "7435", "2052", "783", "322"), B = c("11569", "6677", 
"3852", "983.88", "589", "359"), C = c("20453", "7699", "2499", "707.98", "412", "328"), D = c("7893", NA, "1623", "685.64", 
"321", "644"), E = c("320", "15444", "2049", "1065", "389", "365"), F = c("7438", NA, "3472", "1057", "563", "401"), G = c(7345, 
9001, 2473, 1138, 516, 403), H = c("9004", "3998", "2299", "964.88", "499", "341"), I = c("8434", "8700", "2217", "1263", "567", "352"
), J = c("7734", "6733", "2092", "1115", "637", "332"), K = c(NA, NA, "2118", "862.13", "426", "355"), L = c(6345, 7688, 2311, 
1195, 647, 366), M = c("4222", NA, "1846", "814.61", "422", "314"), N = c("6773", "8934", "2381", "1221", "677", "356"), O = c(NA, 
NA, NA, "564.5", "226", "476")), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Ree Nadeau
  • 127
  • 8

4 Answers4

3

It's popular to use lapply() to convert column types.

df[3:43] <- lapply(df[3:43], as.numeric)

A dplyr alternative:

library(dplyr)

df %>% mutate(across(3:43, as.numeric))

Note that don't use apply() to convert column types of a data.frame. apply() converts the data.frame into a matrix at first, and hence all columns will be coerced into a single type. For example:

df <- data.frame(x = as.character(1:3), y = c(T, T, F))

The column x in df is the character type and y is logical. It's obvious that both columns can be converted to numeric respectively.

as.numeric(df$x)
# [1] 1 2 3
as.numeric(df$y)
# [1] 1 1 0

If you want to convert them at the same time by apply(), it'll crash with a warning!

df[] <- apply(df, 2, as.numeric)
df

#   x  y
# 1 1 NA
# 2 2 NA
# 3 3 NA
#
# Warning message:
# In apply(df, 2, as.numeric) : NAs introduced by coercion

That's because apply() coerces the data.frame to a matrix at first, so all values become character according to hierarchy of types.(character > logical)

as.matrix(df)

#      x   y      
# [1,] "1" "TRUE" 
# [2,] "2" "TRUE" 
# [3,] "3" "FALSE"

Applying as.numeric() on the second columns will create NA. In your case apply() works just because all columns are able to be turned into numeric. But in general it's not a standard way to treat a data.frame. In contrast, lapply() works well.

df[] <- lapply(df, as.numeric)
df

#   x y
# 1 1 1
# 2 2 1
# 3 3 0
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
2

use

df <- type.convert(df)

If you have characters they will be converted to factors. if you want to maintain them as characters use

df <- type.convert(df, as.is = TRUE)

if you only want part of the dataframe. ie in case there are numbers that you want to remain as characters instead:

df[,my_columns]<- type.convert(df[, my_columns])
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Try this base R solution:

df[,3:43] <- apply(df[,3:43],2,as.numeric)
Duck
  • 39,058
  • 13
  • 42
  • 84
  • 1
    please read [David Arenburg's info](https://stackoverflow.com/users/3001626/david-arenburg) to know something about apply – Onyambu Jul 31 '20 at 16:34
  • Agree with @Onyambu's link! In this case `apply()` works just because all columns are able to be turned into numeric. But in general it's not the best way to treat a `data.frame`. Please check my answer. Welcome to give me some feedback. – Darren Tsai Jul 31 '20 at 17:48
  • yeah I just noticed that using this command turned my df into a matrix (I'm still new at R) which totally defeated the purpose of some of my previous work :( thanks for the tips, though! – Ree Nadeau Jul 31 '20 at 20:05
  • @ReeNadeau for clarification, `apply()` returns a matrix, but after assigning it back to `df`, the result should be a `data.frame`. I think Duck's answer works for your case, but it might not be a standard way in this issue. – Darren Tsai Jul 31 '20 at 20:20
0

You can try this simple approach using tidyverse

library(tidyverse)


df <- data.frame(OBS = c("1", "2", "3"), COL_A = c("6", "7", "8"), COL_B = c("11", "12", "13"), COL_D = c("21", "22", "23"))
str(df)
# 'data.frame': 3 obs. of  4 variables:
#   $ OBS  : chr  "1" "2" "3"
# $ COL_A: chr  "6" "7" "8"
# $ COL_B: chr  "11" "12" "13"
# $ COL_D: chr  "21" "22" "23"

df2 <- df %>% 
  mutate_at(vars(COL_B:COL_D), as.numeric)
str(df2)
# 'data.frame': 3 obs. of  4 variables:
#   $ OBS  : chr  "1" "2" "3"
# $ COL_A: chr  "6" "7" "8"
# $ COL_B: num  11 12 13
# $ COL_D: num  21 22 23
Tho Vu
  • 1,304
  • 2
  • 8
  • 20
  • 1
    Good idea! But `mutate_at()` has been superseded by the use of `across()` after `dplyr 1.0.0`. You can check my answer for the use of `across()`. By the way, you only use `mutate_at()` in `dplyr`, so it's no need to load overall `tidyverse` package. Using `library(dplyr)` is enough and more friendly. – Darren Tsai Jul 31 '20 at 19:45