70

Using dplyr, you can do something like this:

iris %>% head %>% mutate(sum=Sepal.Length + Sepal.Width) 
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3

But above, I referenced the columns by their column names. How can I use 1 and 2 , which are the column indices to achieve the same result?

Here I have the following, but I feel it's not as elegant.

iris %>% head %>% mutate(sum=apply(select(.,1,2),1,sum))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3
Alby
  • 5,522
  • 7
  • 41
  • 51

6 Answers6

97

You can try:

iris %>% head %>% mutate(sum = .[[1]] + .[[2]])

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
3          4.7         3.2          1.3         0.2  setosa 7.9
4          4.6         3.1          1.5         0.2  setosa 7.7
5          5.0         3.6          1.4         0.2  setosa 8.6
6          5.4         3.9          1.7         0.4  setosa 9.3
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • 11
    Note this won't combine well with `group_by`: `iris %>% group_by(Species) %>% mutate(sum = .[[1]] + .[[2]])` whereas `iris %>% group_by(Species) %>% mutate(sum=Sepal.Length + Sepal.Width)` does. – MrFlick Sep 16 '15 at 21:17
  • 2
    @MrFlick - Maybe I'm missing something. Why would grouping matter when you're calculating row-wise? They could probably throw an `ungroup()` in there then regroup if they're doing other operations. I've found that necessary before. – Rich Scriven Sep 16 '15 at 21:29
  • 7
    @RichardScriven It's more of a warning that this method is really by-passing much of the dplyr infrastructure so it can break things like grouping that should otherwise work. You are essentially skipping over the `data=` parameter of mutate. You are right that this doesn't really matter for a row-wise `mutate()`, but consider: `iris %>% group_by(Species) %>% summarize(x=mean(.[[1]] + .[[2]]))` This is not a good "general" method to specify columns by index. – MrFlick Sep 16 '15 at 21:37
  • 6
    how does this by column referencing work when you are setting the mutate column? iris %>% head %>% mutate(.[[1]] = .[[1]] + .[[2]]) gives: Error: unexpected '=' in "iris %>% head %>% mutate(.[[1]] =" – pluke Mar 31 '17 at 09:56
  • As for `dplyr` 1.0.0, there's this workaround: `df %>% group_by(eval(names(.)[1])) %>% ...` – Jorge Esteban Mendoza Aug 04 '20 at 01:09
  • 1
    Another caveat of this solution is that the native pipe operator `|>` does not support the `.` notation. – cbrnr Mar 22 '23 at 08:29
5

I'm a bit late to the game, but my personal strategy in cases like this is to write my own tidyverse-compliant function that will do exactly what I want. By tidyverse-compliant, I mean that the first argument of the function is a data frame and that the output is a vector that can be added to the data frame.

sum_cols <- function(x, col1, col2){
   x[[col1]] + x[[col2]]
}

iris %>%
  head %>%
  mutate(sum = sum_cols(x = ., col1 = 1, col2 = 2))
SavedByJESUS
  • 3,262
  • 4
  • 32
  • 47
4

An alternative to reusing . in mutate that will respect grouping is to use dplyr::cur_data_all(). From help(cur_data_all)

cur_data_all() gives the current data for the current group (including grouping variables)

Consider the following:

iris %>% group_by(Species) %>% mutate(sum = .[[1]] + .[[2]]) %>% head
#Error: Problem with `mutate()` column `sum`.
#ℹ `sum = .[[1]] + .[[2]]`.
#ℹ `sum` must be size 50 or 1, not 150.
#ℹ The error occurred in group 1: Species = setosa.

If instead you use cur_data_all(), it works without issue:

iris %>% mutate(sum = select(cur_data_all(),1) + select(cur_data_all(),2)) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length
#1          5.1         3.5          1.4         0.2  setosa          8.6
#2          4.9         3.0          1.4         0.2  setosa          7.9
#3          4.7         3.2          1.3         0.2  setosa          7.9
#4          4.6         3.1          1.5         0.2  setosa          7.7
#5          5.0         3.6          1.4         0.2  setosa          8.6
#6          5.4         3.9          1.7         0.4  setosa          9.3

The same approach works with the extract operator ([[).

iris %>% mutate(sum = cur_data()[[1]] + cur_data()[[2]]) %>% head()
#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species sum
#1          5.1         3.5          1.4         0.2  setosa 8.6
#2          4.9         3.0          1.4         0.2  setosa 7.9
#3          4.7         3.2          1.3         0.2  setosa 7.9
#4          4.6         3.1          1.5         0.2  setosa 7.7
#5          5.0         3.6          1.4         0.2  setosa 8.6
#6          5.4         3.9          1.7         0.4  setosa 9.3
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
1

What do you think about this version?
Inspired by @SavedByJesus's answer.

applySum <- function(df, ...) {
  assertthat::assert_that(...length() > 0, msg = "one or more column indexes are required")
  mutate(df, Sum = apply(as.data.frame(df[, c(...)]), 1, sum))
}

iris %>%
  head(2) %>%
  applySum(1, 2)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sum
1          5.1         3.5          1.4         0.2  setosa 8.6
2          4.9         3.0          1.4         0.2  setosa 7.9
#
### you can select and sum more then two columns by the same function
#
iris %>%
  head(2) %>%
  applySum(1, 2, 3, 4)
#
### output
#
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species  Sum
1          5.1         3.5          1.4         0.2  setosa 10.2
2          4.9         3.0          1.4         0.2  setosa  9.5
benaja
  • 137
  • 11
1

This can now (packageVersion("dplyr") >= 1.0.0) be done very nicely with the combination of dplyr::rowwise() and dplyr::c_across().

library(dplyr)

packageVersion("dplyr")
#> [1] '1.0.10'

iris %>% 
  head %>% 
  rowwise() %>% 
  mutate(sum = sum(c_across(c(1, 2))))
#> # A tibble: 6 × 6
#> # Rowwise: 
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species   sum
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl>
#> 1          5.1         3.5          1.4         0.2 setosa    8.6
#> 2          4.9         3            1.4         0.2 setosa    7.9
#> 3          4.7         3.2          1.3         0.2 setosa    7.9
#> 4          4.6         3.1          1.5         0.2 setosa    7.7
#> 5          5           3.6          1.4         0.2 setosa    8.6
#> 6          5.4         3.9          1.7         0.4 setosa    9.3

Created on 2022-11-01 with reprex v2.0.2

Dan Adams
  • 4,971
  • 9
  • 28
0

To address the issue that @pluke is asking about in the comments, dplyr doesn't really support column index.

Not a perfect solution, but you can use base R to get around this iris[1] <- iris[1] + iris[2]