4

I'm trying to learn how to use the across() function in R, and I want to do a simple rowSums() with it. However, I keep getting this error:

Error: Problem with mutate() input ..2. x 'x' must be numeric ℹ Input ..2 is rowSums(., na.rm = TRUE).

Yet, all my relevant columns are numeric. Any help any explanation why I'm getting this error would be greatly appreciated!

Here's a reproducible example:

library(dplyr)
test <- tibble(resource_name = c("Justin", "Corey", "Justin"),
       project = c("P1", "P2", "P3"),
       sep_2021 = c(1, 2, NA),
       oct_2021 = c(5, 2, 1))


test %>%
  select(resource_name, project, sep_2021, oct_2021) %>%
  mutate(total = across(contains("_20")), rowSums(., na.rm = TRUE))

And here's why I'm going for

answer <-  tibble(resource_name = c("Justin", "Corey", "Justin"),
                  project = c("P1", "P2", "P3"),
                  sep_2021 = c(1, 2, NA),
                  oct_2021 = c(5, 2, 1),
                  total = c(6, 4, 1))

Note: my real dataset has many columns, and the order is variable. Because of that, I really want to use the contains("_20") portion of my code and not the indices.

Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
J.Sabree
  • 2,280
  • 19
  • 48

2 Answers2

6

We may use adorn_totals

library(dplyr)
library(janitor)
test %>%
     adorn_totals("col", name = "total")

-output

  resource_name project sep_2021 oct_2021 total
        Justin      P1        1        5     6
         Corey      P2        2        2     4
        Justin      P3       NA        1     1

With rowSums and across, the syntax would be

test %>% 
   mutate(total = rowSums(across(contains("_20")), na.rm = TRUE))

-output

# A tibble: 3 x 5
  resource_name project sep_2021 oct_2021 total
  <chr>         <chr>      <dbl>    <dbl> <dbl>
1 Justin        P1             1        5     6
2 Corey         P2             2        2     4
3 Justin        P3            NA        1     1

In the OP's code, the across selects the columns, but the rowSums is done on the entire data (.) instead of the one that is selected

akrun
  • 874,273
  • 37
  • 540
  • 662
1

Update: As commented by akrun (see comment), we my use c_across

test %>%
    rowwise() %>% 
    mutate(total = sum(c_across(contains("_20")), na.rm = TRUE))

Here is another dplyr option to calculate the row sums (with rowwise and sum:

test %>%
    rowwise() %>% 
    mutate(total = sum(across(contains("_20")), na.rm = TRUE))
  resource_name project sep_2021 oct_2021 total
  <chr>         <chr>      <dbl>    <dbl> <dbl>
1 Justin        P1             1        5     6
2 Corey         P2             2        2     4
3 Justin        P3            NA        1     1
TarJae
  • 72,363
  • 6
  • 19
  • 66