1

I would like to perform rowwise operations on an arbitrary set of columns using column names instead of index numbers. I know this is possible using base r and columns indexed by numbers, but it would be less error prone for me if I could do this using column names within a tidyverse pipe.

Here's the basis for what I am trying to do.

library(tidyverse)

df <- tibble(name = c("Sam", "Jane", "Erin", "Bert", "Lola"),
             age     = c(11, 10, 11, 12, 10),
             score_a = c(19, 16, 16, 5, 10),
             score_b = c(10, 10, 10, 10, 10))
df %>% 
  rowwise() %>% 
  mutate_at(vars(score_a:score_b),funs(total = sum))
#> Source: local data frame [5 x 6]
#> Groups: <by row>
#> 
#> # A tibble: 5 x 6
#>   name    age score_a score_b score_a_total score_b_total
#>   <chr> <dbl>   <dbl>   <dbl>         <dbl>         <dbl>
#> 1 Sam      11      19      10            19            10
#> 2 Jane     10      16      10            16            10
#> 3 Erin     11      16      10            16            10
#> 4 Bert     12       5      10             5            10
#> 5 Lola     10      10      10            10            10

Created on 2019-01-10 by the reprex package (v0.2.1)

I would actually like a table that looks like this

#> # A tibble: 5 x 6
#>   name    age score_a score_b  total
#>   <chr> <dbl>   <dbl>   <dbl>  <dbl>
#> 1 Sam      11      19      10    29
#> 2 Jane     10      16      10    27
#> 3 Erin     11      16      10    26
#> 4 Bert     12       5      10    15
#> 5 Lola     10      10      10    20

Created on 2019-01-10 by the reprex package (v0.2.1)

pogibas
  • 27,303
  • 19
  • 84
  • 117
TDP
  • 335
  • 1
  • 3
  • 9
  • Do all the columns you want to use have a common name? Like "score_a" to "score_xy"? – tmfmnk Jan 10 '19 at 21:48
  • They do not. I think @MrFlick found a similar question. It's not super pretty but this looks to be what I want. df %>% mutate(total = select(., score_a:score_b) %>% rowSums) – TDP Jan 10 '19 at 21:51
  • MrFlick's response is the type of thing you're looking for. `mutate_at` performs the same function on each column. What you were/are looking for are rowwise functions such as `rowSums`. – Adam Sampson Jan 10 '19 at 21:53
  • @TDP: I posted an alternative approach in @MrFlick's original question. Please take a look. It's the cleanest solution I know, and it doesn't require a `select` inside a `mutate` or the self-referrential `.` throughout the call. – Artem Sokolov Jan 11 '19 at 00:01

1 Answers1

0

I think that using grep() inside rowwise functions could be a possibility:

library(tidyverse)

df %>%
 mutate(total_score = rowSums(.[,grep("score_a|score_b", names(.))])) 

  name    age score_a score_b total_score
  <chr> <dbl>   <dbl>   <dbl>       <dbl>
1 Sam     11.     19.     10.         29.
2 Jane    10.     16.     10.         26.
3 Erin    11.     16.     10.         26.
4 Bert    12.      5.     10.         15.
5 Lola    10.     10.     10.         20.

df %>%
 mutate(total_score = rowSums(.[,grep("age|score_b", names(.))])) 

  name    age score_a score_b total_score
  <chr> <dbl>   <dbl>   <dbl>       <dbl>
1 Sam     11.     19.     10.         21.
2 Jane    10.     16.     10.         20.
3 Erin    11.     16.     10.         21.
4 Bert    12.      5.     10.         22.
5 Lola    10.     10.     10.         20.

This way you can you select an arbitrary set of columns and refer to the columns by their names.

tmfmnk
  • 38,881
  • 4
  • 47
  • 67