dplyr: how to reference columns by column index rather than column name using mutate?

Question

Using dplyr, you can do something like this:

iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width) 
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

But above, I referenced the columns by their column names. How can I use 1 and 2 , which are the column indices to achieve the same result?

Here I have the following, but I feel it's not as elegant.

iris %>% head %>% mutate(sum=apply(select(.,1,2),1,sum))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

score 97 · Accepted Answer · answered Sep 16 '15 at 21:11

97

You can try:

iris %>% head %>% mutate(sum = .[[1]] + .[[2]])

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

answered Sep 16 '15 at 21:11

jeremycg

24,657
5
63
74

11

Note this won't combine well with `group_by`: `iris %>% group_by(Species) %>% mutate(sum = .[[1]] + .[[2]])` whereas `iris %>% group_by(Species) %>% mutate(sum=Sepal.Length + Sepal.Width)` does. – MrFlick Sep 16 '15 at 21:17
2

@MrFlick - Maybe I'm missing something. Why would grouping matter when you're calculating row-wise? They could probably throw an `ungroup()` in there then regroup if they're doing other operations. I've found that necessary before. – Rich Scriven Sep 16 '15 at 21:29
7

@RichardScriven It's more of a warning that this method is really by-passing much of the dplyr infrastructure so it can break things like grouping that should otherwise work. You are essentially skipping over the `data=` parameter of mutate. You are right that this doesn't really matter for a row-wise `mutate()`, but consider: `iris %>% group_by(Species) %>% summarize(x=mean(.[[1]] + .[[2]]))` This is not a good "general" method to specify columns by index. – MrFlick Sep 16 '15 at 21:37
6

how does this by column referencing work when you are setting the mutate column? iris %>% head %>% mutate(.[[1]] = .[[1]] + .[[2]]) gives: Error: unexpected '=' in "iris %>% head %>% mutate(.[[1]] =" – pluke Mar 31 '17 at 09:56
As for `dplyr` 1.0.0, there's this workaround: `df %>% group_by(eval(names(.)[1])) %>% ...` – Jorge Esteban Mendoza Aug 04 '20 at 01:09
1

Another caveat of this solution is that the native pipe operator `|>` does not support the `.` notation. – cbrnr Mar 22 '23 at 08:29

score 5 · Answer 2 · answered Nov 06 '18 at 05:33

I'm a bit late to the game, but my personal strategy in cases like this is to write my own tidyverse-compliant function that will do exactly what I want. By tidyverse-compliant, I mean that the first argument of the function is a data frame and that the output is a vector that can be added to the data frame.

sum_cols <- function(x, col1, col2){
   x[[col1]] + x[[col2]]
}

iris %>%
  head %>%
  mutate(sum = sum_cols(x = ., col1 = 1, col2 = 2))

score 4 · Answer 3 · answered Feb 15 '22 at 20:31

An alternative to reusing . in mutate that will respect grouping is to use dplyr::cur_data_all(). From help(cur_data_all)

cur_data_all() gives the current data for the current group (including grouping variables)

Consider the following:

iris %>% group_by(Species) %>% mutate(sum = .[[1]] + .[[2]]) %>% head
#Error: Problem with `mutate()` column `sum`.
#ℹ `sum = .[[1]] + .[[2]]`.
#ℹ `sum` must be size 50 or 1, not 150.
#ℹ The error occurred in group 1: Species = setosa.

If instead you use cur_data_all(), it works without issue:

iris %>% mutate(sum = select(cur_data_all(),1) + select(cur_data_all(),2)) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
#1          5.1         3.5          1.4         0.2  setosa          8.6
#2          4.9         3.0          1.4         0.2  setosa          7.9
#3          4.7         3.2          1.3         0.2  setosa          7.9
#4          4.6         3.1          1.5         0.2  setosa          7.7
#5          5.0         3.6          1.4         0.2  setosa          8.6
#6          5.4         3.9          1.7         0.4  setosa          9.3

The same approach works with the extract operator ([[).

iris %>% mutate(sum = cur_data()[[1]] + cur_data()[[2]]) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
#1          5.1         3.5          1.4         0.2  setosa 8.6
#2          4.9         3.0          1.4         0.2  setosa 7.9
#3          4.7         3.2          1.3         0.2  setosa 7.9
#4          4.6         3.1          1.5         0.2  setosa 7.7
#5          5.0         3.6          1.4         0.2  setosa 8.6
#6          5.4         3.9          1.7         0.4  setosa 9.3

score 1 · Answer 4 · answered Dec 11 '20 at 16:06

What do you think about this version?
Inspired by @SavedByJesus's answer.

applySum <- function(df, ...) {
  assertthat::assert_that(...length() > 0, msg = "one or more column indexes are required")
  mutate(df, Sum = apply(as.data.frame(df[, c(...)]), 1, sum))
}

iris %>%
  head(2) %>%
  applySum(1, 2)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
#
### you can select and sum more then two columns by the same function
#
iris %>%
  head(2) %>%
  applySum(1, 2, 3, 4)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species  Sum
1          5.1         3.5          1.4         0.2  setosa 10.2
2          4.9         3.0          1.4         0.2  setosa  9.5

Dan Adams · Answer 5 · 2022-11-02T00:02:30.580

This can now (packageVersion("dplyr") >= 1.0.0) be done very nicely with the combination of dplyr::rowwise() and dplyr::c_across().

library(dplyr)

packageVersion("dplyr")
#> [1] '1.0.10'

iris %>% 
  head %>% 
  rowwise() %>% 
  mutate(sum = sum(c_across(c(1, 2))))
#> # A tibble: 6 × 6
#> # Rowwise: 
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   sum
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl>
#> 1          5.1         3.5          1.4         0.2 setosa    8.6
#> 2          4.9         3            1.4         0.2 setosa    7.9
#> 3          4.7         3.2          1.3         0.2 setosa    7.9
#> 4          4.6         3.1          1.5         0.2 setosa    7.7
#> 5          5           3.6          1.4         0.2 setosa    8.6
#> 6          5.4         3.9          1.7         0.4 setosa    9.3

^{Created on 2022-11-01 with reprex v2.0.2}

score 0 · Answer 6 · answered Jun 29 '18 at 19:30

0

To address the issue that @pluke is asking about in the comments, dplyr doesn't really support column index.

Not a perfect solution, but you can use base R to get around this iris[1] <- iris[1] + iris[2]

answered Jun 29 '18 at 19:30

Nina Sonneborn

52
5

Linked comment about dplyr doesn't support column index ... what is the loop solution I wonder? – Markm0705 Jul 24 '21 at 08:54

dplyr: how to reference columns by column index rather than column name using mutate?

6 Answers6

Linked

Related