2

I'm trying to sum specific columns of my data that fills some condition, like the example underneath

library(dplyr)
library(readr)
set.seed(123)
data=data.frame(id=1:4,
                v1=sample(c("a","b"),4,TRUE),
                v2=sample(c("a","b"),4,TRUE),
                v3=sample(c("a","b"),4,TRUE),
                v4=sample(c("a","b"),4,TRUE),
                v5=sample(c("a","b"),4,TRUE),
                v5=sample(c("a","b"),4,TRUE)
                )
data%>%
  rowwise()%>%
  mutate(across(v1:v4,~sum(.x=="a")))%>%
  mutate(n_a=sum(c(v1,v2,v3,v4)))
#> # A tibble: 4 × 8
#> # Rowwise: 
#>      id    v1    v2    v3    v4 v5    v5.1    n_a
#>   <int> <int> <int> <int> <int> <chr> <chr> <int>
#> 1     1     1     1     1     0 b     a         3
#> 2     2     1     0     1     1 a     b         3
#> 3     3     1     0     0     0 a     a         1
#> 4     4     0     0     0     1 a     a         1

here n_a is the sum of vars from v1 to v4 that have the value a could have a better implementation of my code ?

  • one mutate line with no transformation of other vars ?
  • can i use sum with something like v1:v4 ?

Created on 2023-07-28 with reprex v2.0.2

Wael
  • 1,640
  • 1
  • 9
  • 20

3 Answers3

2

You may use rowSums with pick -

library(dplyr)

data %>%
  mutate(n_a = rowSums(pick(v1:v4) == "a", na.rm = TRUE))

#  id v1 v2 v3 v4 v5 v5.1 n_a
#1  1  a  a  a  b  b    a   3
#2  2  a  b  a  a  a    b   3
#3  3  a  b  b  b  a    a   1
#4  4  b  b  b  a  a    a   1
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

A similar question was asked a few days ago here.

You can use rowSums with ==.

data$n_a <- rowSums(data[, 2:6] == "a")

data
  id v1 v2 v3 v4 v5 n_a
1  1  a  a  a  b  b   3
2  2  a  b  a  a  a   4
3  3  a  b  b  b  a   2
4  4  b  b  b  a  a   2

The columns to count over can be customised as required too.

data$n_a <- rowSums(data[, -1] == "a")
data$n_a <- rowSums(data[, c("v1", "v2", "v3", "v4", "v5")] == "a")
data$n_a <- rowSums(data[, startsWith(colnames(data), "v")] == "a")
danishzone
  • 56
  • 3
1

your code is the right way, however to shorten the code you can choose the opposite way of using across() inside sum() in your second line and define a function inside across() to calculate "a" characters:

library(tidyverse) 
data %>% rowwise() %>% mutate(n_a=sum(across(v1:v4, \(x) x=="a")))

     id v1    v2    v3    v4    v5    v5.1    n_a
  <int> <chr> <chr> <chr> <chr> <chr> <chr> <int>
1     1 a     a     a     b     b     a         3
2     2 a     b     a     a     a     b         3
3     3 a     b     b     b     a     a         1
4     4 b     b     b     a     a     a         1
S-SHAAF
  • 1,863
  • 2
  • 5
  • 14
  • so to have it in the same writing as in the dplyr ref page of across function https://dplyr.tidyverse.org/reference/across.html my_fun = function(x) x=="a" ; data %>% rowwise() %>% mutate(n_a=sum(across(v1:v4, ~my_fun(.x)))) ; is there a dplyr ref page for the \(x) type of scripting – Wael Jul 30 '23 at 14:59