0

Example data:

(tmp_df <-
    expand.grid(id = letters[1:3], y = 1:3))
#    id y
# 1  a 1
# 2  b 1
# 3  c 1
# 4  a 2
# 5  b 2
# 6  c 2
# 7  a 3
# 8  b 3
# 9  c 3

The following works:

tmp_df %>%
    group_by(id) %>%
    mutate_at(which(colnames(.) %in% c("y")),
              sum)
#   id        y
#   <fct> <int>
# 1 a         6
# 2 b         6
# 3 c         6
# 4 a         6
# 5 b         6
# 6 c         6
# 7 a         6
# 8 b         6
# 9 c         6

but the following throws the error Error: Only strings can be converted to symbols:

tmp_df %>%
    group_by(id) %>%
    summarise_at(which(colnames(.) %in% c("y")),
              sum)

Note that the following code snippets are alternative ways that successfully generate the expected result:

tmp_df %>%
    group_by(id) %>%
    summarise_at(vars(y),
                 sum)


tmp_df %>%
    group_by(id) %>%
    summarise_at("y",
                 sum)

EDIT: following akrun's answer I should note that the dplyr version I am using is dplyr_0.8.4

Alex
  • 15,186
  • 15
  • 73
  • 127
  • Is this a bug? the helpfile for the `.vars` parameter explicitly says: `..., a numeric vector of column positions, ...` (I assume that this is what throws the error) – Alex Feb 02 '21 at 05:06
  • 1
    The accepted answer of the purported duplicate claims this was fixed but if it was then the problem got reintroduced at some later version so that it continues to be a problem. – G. Grothendieck Feb 02 '21 at 06:02

3 Answers3

2

It seems that in mutate_at the column numbers include the grouping variables but in summarize_at they do not as both of the lines of code below work. You could report this bug although given that the _at functions have been superseded by across I don't know whether it would be fixed.

tmp_df %>% group_by(id) %>% mutate_at(2, sum)

tmp_df %>% group_by(id) %>% summarize_at(1, sum)

This is further reinforced by the fact that if we swap the columns then they both work consistently since the grouping variable no longer affects the position of the y column.

tmp_df[2:1] %>% group_by(id) %>% mutate_at(1, sum)

tmp_df[2:1] %>% group_by(id) %>% summarize_at(1, sum)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • that is a very clever diagnostic trick to swap the column ordering. And I think I will file a bug even though I don't think that it will be fixed either. – Alex Feb 02 '21 at 05:42
1

We can use contains

library(dplyr)
tmp_df %>% 
    group_by(id) %>% 
    summarise(across(contains('y'), sum), .groups = 'drop')

The _at, _all suffix functions are deprecated and in place it is the across currently used

akrun
  • 874,273
  • 37
  • 540
  • 662
1

which(colnames(.) %in% c("y")) returns you the index 2.

which(colnames(tmp_df) %in% c("y"))
#[1] 2

This is fine when you use mutate_at.

library(dplyr)
tmp_df %>% group_by(id) %>% mutate_at(2,sum)

#   id        y
#  <fct> <int>
#1 a         6
#2 b         6
#3 c         6
#4 a         6
#5 b         6
#6 c         6
#7 a         6
#8 b         6
#9 c         6

However, when you use summarise_at it does not count the grouped column. So you get an error when you do :

tmp_df %>% group_by(id) %>% summarise_at(2,sum)

Error: Only strings can be converted to symbols

What you actually needed here is

tmp_df %>% group_by(id) %>% summarise_at(1,sum)

#   id        y
#* <fct> <int>
#1 a         6
#2 b         6
#3 c         6

However, it is not possible to dynamically change the position of column number that we want to use in summarise_at based on number of columns in group_by so a better option is to pass column names in vars instead of column number.

tmp_df %>% group_by(id) %>% mutate_at(vars('y'),sum)

#  id        y
#  <fct> <int>
#1 a         6
#2 b         6
#3 c         6
#4 a         6
#5 b         6
#6 c         6
#7 a         6
#8 b         6
#9 c         6

tmp_df %>% group_by(id) %>% summarise_at(vars('y'),sum)

#  id        y
#* <fct> <int>
#1 a         6
#2 b         6
#3 c         6

Good thing in across is that it behaves consistently for mutate as well as summarise.

tmp_df %>% group_by(id) %>% mutate(across(2,sum))

x Can't subset columns that don't exist. x Location 2 doesn't exist.

tmp_df %>% group_by(id) %>% summarise(across(2,sum))

x Can't subset columns that don't exist. x Location 2 doesn't exist.

Even with across it is better to use column name rather than position.

tmp_df %>% group_by(id) %>% mutate(across(y,sum))
tmp_df %>% group_by(id) %>% summarise(across(y,sum))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you for the detailed explanation of the error message. Just as a note doing the following modification is more robust for my purposes: `summarise_at(colnames(.)[colnames(.) %in% 'y'], ...)` – Alex Feb 02 '21 at 05:41