4

For example, I have a data frame where I want to get the sum of all columns beginning with "Q3", I could type out each one, but it would be great to be able to do something like this:

#Example data
test <- structure(list(UserID = c("53017366", "53017366"), Q3_1 = c(3L, 
3L), Q3_2 = c(2L, 2L), Q3_3 = c(3L, 3L), Q3_4 = c(NA, 5L)), class = "data.frame", row.names = c(NA, -2L), .Names = c("UserID", "Q3_1", "Q3_2", "Q3_3", "Q3_4"))

#what I'd like to see, but doesn't work
test %>% mutate(total = sum(starts_with("Q3"), na.rm = TRUE))

#What I'd like to end up with:
    UserID Q3_1 Q3_2 Q3_3 Q3_4 total
1 53017366    3    2    3   NA     8
2 53017366    3    2    3    5    13

One option suggested here and here requires putting an entire new select column in the data and using a rowwise function (like rowSums).

Community
  • 1
  • 1
Andrew Jackson
  • 823
  • 1
  • 11
  • 23

2 Answers2

4

We can use rowSums

test %>% 
     mutate(total = rowSums(.[grep("Q3", names(.))], na.rm =TRUE))
#    UserID Q3_1 Q3_2 Q3_3 Q3_4 total
#1 53017366    3    2    3   NA     8
#2 53017366    3    2    3    5    13
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    Something like this, derived from your answer, works too. Thanks: `test %>% mutate(total = rowSums(.[starts_with("Q3", vars = names(.))], na.rm = TRUE))` – Andrew Jackson Feb 03 '17 at 20:06
  • And in this case I am thinking about overall scores and some subscales (though I'm not sure what I'll use). I can add a `[1:3]` to the selection to subset things easily. It seems like a pretty easy way to do it, but if there's a better way I'd be interested in learning. – Andrew Jackson Feb 03 '17 at 20:12
2

For this case you can use janitor::add_totals_col:

library(janitor)
test %>%
  add_totals_col()
#>     UserID Q3_1 Q3_2 Q3_3 Q3_4 Total
#> 1 53017366    3    2    3   NA     8
#> 2 53017366    3    2    3    5    13

To your deeper question about needing to use select helpers in mutate here, this need can indicate an underlying lack of tidy data structure. In this case you have values in column headers. In a longer tidy format, say with a variable quarter, you could do this elegantly with group_by and summarise.

Sam Firke
  • 21,571
  • 9
  • 87
  • 105
  • Didn't know that function. Thanks for providing it – akrun Feb 03 '17 at 04:25
  • Thanks for the suggestion. This is a small part of survey data, so the columns are different questions. It keeps going with Q3_5, and so on, but eventually hits Q4_1, Q4_2, ... Having a long data set like that is unintuitive to me, but I guess it could include a column for the question set and one for what question it was, then the value. – Andrew Jackson Feb 03 '17 at 19:57