6

In dplyr, how do you perform rowwise summation over selected columns (using column index)?

This doesn't work

> iris  %>% mutate(sum=sum(.[1:4])) %>% head
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species    sum
1          5.1         3.5          1.4         0.2  setosa 2078.7
2          4.9         3.0          1.4         0.2  setosa 2078.7
3          4.7         3.2          1.3         0.2  setosa 2078.7
4          4.6         3.1          1.5         0.2  setosa 2078.7
5          5.0         3.6          1.4         0.2  setosa 2078.7
6          5.4         3.9          1.7         0.4  setosa 2078.7

I can do the following, but it's not beautiful

> iris %>% mutate(index=1:n()) %>%  
                gather("param", "value", 1:4)  %>% 
                group_by(index) %>% 
                mutate(sum=sum(value)) %>% 
                spread(param, value) %>% select(-index)
Source: local data frame [150 x 6]

   Species  sum Sepal.Length Sepal.Width Petal.Length Petal.Width
1   setosa 10.2          5.1         3.5          1.4         0.2
2   setosa  9.5          4.9         3.0          1.4         0.2
3   setosa  9.4          4.7         3.2          1.3         0.2
4   setosa  9.4          4.6         3.1          1.5         0.2
5   setosa 10.2          5.0         3.6          1.4         0.2
6   setosa 11.4          5.4         3.9          1.7         0.4
7   setosa  9.7          4.6         3.4          1.4         0.3
8   setosa 10.1          5.0         3.4          1.5         0.2
9   setosa  8.9          4.4         2.9          1.4         0.2
10  setosa  9.6          4.9         3.1          1.5         0.1
..     ...  ...          ...         ...          ...         ...

Is there more syntactically nicer way to achieve this?

EDIT: It's different from other questions, because I want to do rowwise operation on the columns selected by using column indices"

Alby
  • 5,522
  • 7
  • 41
  • 51
  • 7
    Try `iris %>% mutate(sum=Reduce("+",.[1:4]))`. Alternatively, the base `rowSums` function does what you are asking for. – nicola Jul 02 '15 at 19:37
  • 2
    possible duplicate of [Applying a function to every row of a table using dplyr?](http://stackoverflow.com/questions/21818181/applying-a-function-to-every-row-of-a-table-using-dplyr) – jeremycg Jul 02 '15 at 19:38
  • `dplyr` offers the `rowwise` function that maybe might be helpful. – SabDeM Jul 02 '15 at 19:50
  • 1
    As showed in the link, an option would be `rowwise` with `do` i.e. `iris %>% rowwise() %>% do(data.frame(., sum=sum(unlist(.[1:4]))))` – akrun Jul 02 '15 at 19:54
  • @nicola I like your answer the best. Can you make that into an answer so that I can pick it as a final answer? – Alby Jul 04 '15 at 03:21
  • @Alby Glad you appreciated it. I'll add an answer. – nicola Jul 04 '15 at 10:23

4 Answers4

12

As already said in the comment, you can accomplish your task with:

iris %>% mutate(sum=Reduce("+",.[1:4]))

In this case also the base rowSums works:

iris$sum<-rowSums(iris[,1:4])
nicola
  • 24,005
  • 3
  • 35
  • 56
  • 2
    I realize the OP wanted to use the column index, but is there a way to use the column names? I'm thinking of something like the select function select(df, V4:V13). Not sure how to put this in here. – emudrak May 09 '18 at 20:30
1

You can (ab)use base R's subset, which allows selection of columns by number:

iris %>% subset(select=1:4) %>% mutate(sum=rowSums(.))
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • 3
    The `dplyr` `select` function seems to work fine: `iris %>% select(1:4) %>% mutate(sum=rowSums(.))`..? – Frank Jul 02 '15 at 21:01
1

Not sure this is correct etiquette but I prefer to recycle this thread instead of starting a new one given, I am pretty sure, I am doing just a rookie mistake.

Why is this working alright:

test$sum <- Reduce("+", test[,3:ncol(test)])

while this (to me identical) doesn't?

test %>%
  mutate(sum = Reduce("+",.[3:ncol(.)]))

The error given is

Error in mutate_impl(.data, dots) : 
  Column `sum` must be length 1 (the group size), not 915

I am banging my head on the desk since 30' or so!

I wish I could give you the underlying dataset but I really can't.

Column 1:2 are text fields while 3:ncol(.) are TRUE/FALSE (logical) ones. ncol(.) = 33.

Matteo Castagna
  • 472
  • 3
  • 13
0

I think the ability to do row-wise operations is a weak point of the tidyverse syntax, but purrr:pmap_* is useful to run this, although it is not that obvious:

iris %>% 
  mutate(total = pmap_dbl(select(., -Species), sum))

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  total
1            5.1         3.5          1.4         0.2     setosa  10.2
2            4.9         3.0          1.4         0.2     setosa   9.5
3            4.7         3.2          1.3         0.2     setosa   9.4
4            4.6         3.1          1.5         0.2     setosa   9.4
5            5.0         3.6          1.4         0.2     setosa  10.2

Alternatively, you can use select_if(., is.numeric) instead of select(., -Species) to be more generalized (but it will not work if there are some numeric variables you don't want included in the calculation).

Phil
  • 7,287
  • 3
  • 36
  • 66