0

I have imported an excel spreadsheet into R and the data frame has numerous columns that should be numeric. I can format a named column as numeric as follows:

df$quantity <- as.numeric(df$quantity)

How would I do this for certain named columns? Here's an example data frame, though it doesn't have anywhere near as many columns as the real thing. Ideally the answer would use dplyr.

cols.to.format <- c("quantity", "li_hep", "edta")

df <- structure(list(source = c("Biobank", "Biobank", "Biobank", "Biobank", 
"Biobank"), sample_type = c("EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot", 
"EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot", "EDTA Plasma Large Aliquot"
), quantity = c("10", "3", "8", "0", "7"), li_hep = c("0", "0", 
"0", "0", "0"), edta = c("2", "2", "0", "0", "0")), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))
Mike
  • 921
  • 7
  • 26
  • Does this answer your question? [Change multiple columns to lowercase with dplyr. Difficulty with mutate across everything minus](https://stackoverflow.com/questions/69661679/change-multiple-columns-to-lowercase-with-dplyr-difficulty-with-mutate-across-e) – user438383 Aug 06 '22 at 14:59

4 Answers4

1

Using across and all_of you could do;

library(dplyr, warn = FALSE)

cols.to.format <- c("quantity", "li_hep", "edta")

df %>%
  mutate(across(all_of(cols.to.format), as.numeric))
#> # A tibble: 5 × 5
#>   source  sample_type               quantity li_hep  edta
#>   <chr>   <chr>                        <dbl>  <dbl> <dbl>
#> 1 Biobank EDTA Plasma Large Aliquot       10      0     2
#> 2 Biobank EDTA Plasma Large Aliquot        3      0     2
#> 3 Biobank EDTA Plasma Large Aliquot        8      0     0
#> 4 Biobank EDTA Plasma Large Aliquot        0      0     0
#> 5 Biobank EDTA Plasma Large Aliquot        7      0     0
stefan
  • 90,330
  • 6
  • 25
  • 51
  • Why is the all_of() statement required? Could you not just put mutate(across(cols.to.format, as.numeric))? – Mike Aug 06 '22 at 14:55
  • Yeah. In general you could go without. But it's recommended do use with character vectors, especially in a package I quite often get warnings when not doing so. – stefan Aug 06 '22 at 14:58
0

I would use a loop for this:

for (col in cols.to.format) {
  df[[col]] <- as.numeric(df[[col]])
}
Robert Hacken
  • 3,878
  • 1
  • 13
  • 15
0

Here's a solution if you don't know in advance which columns should be formatted as numeric (so it spares you the effort of sieving through your dataframe and jotting down all relevant columns names):

library(dplyr)
library(stringr)
df %>%
  mutate(across(where(~any(str_detect(., "^\\d+$"))), as.numeric)) 
# A tibble: 5 × 5
  source  sample_type               quantity li_hep  edta
  <chr>   <chr>                        <dbl>  <dbl> <dbl>
1 Biobank EDTA Plasma Large Aliquot       10      0     2
2 Biobank EDTA Plasma Large Aliquot        3      0     2
3 Biobank EDTA Plasma Large Aliquot        8      0     0
4 Biobank EDTA Plasma Large Aliquot        0      0     0
5 Biobank EDTA Plasma Large Aliquot        7      0     0

The regex ^\\d+$ in str_detect asserts the values must be numeric from string start ^ to string end $.

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

You can format the named columns with mutate_at:

library(dplyr)
df %>%
  mutate_at(
    .vars = cols.to.format,
    .funs = as.numeric
  )
fl8243806
  • 36
  • 4