How to get rowSums for selected columns in R

Question

I am a newbie to R and seek help to calculate sums of selected column for each row. My simple data frame is as below.

data = data.frame(location = c("a","b","c","d"),
            v1 = c(3,4,3,3), v2 = c(4,56,3,88), v3 =c(7,6,2,9), v4=c(7,6,1,9),
            v5 =c(4,4,7,9), v6 = c(2,8,4,6))

I want sum of columns V1 to V3 and V4 to V6 for my each row in a new data frame.

I did something like below.

rowSums(data[,2:4][,5:7])

But something should be wrong in my codes. Thanks in advance for any help.

Technophobe01 · Answer 1 · 2016-05-07T07:29:12.317

9

My sense would be to use dply:

require(dply)
data %>% mutate(v2v4 = rowSums(.[2:4])) %>% mutate(v4v6 = rowSums(.[5:7])) %>% select(-(location:v6))

result:

> newDf <- data %>% mutate(v2v4 = rowSums(.[2:4])) %>% mutate(v4v6 = rowSums(.[5:7])) %>% select(-(location:v6))
> newDf
  v2v4 v4v6
1   14   13
2   66   18
3    8   12
4  100   24

edited May 07 '16 at 07:29

answered May 07 '16 at 07:18

Technophobe01

8,212
3
32
59

rafa.pereira · Accepted Answer · 2016-05-07T15:20:22.467

3

Here is a quite simple solution using apply.

output <- data.frame( x1 = apply(data[2:4], 1, sum) ,
                      x2 = apply(data[5:7], 1, sum) )

result:

output
>    x1 x2
> 1  14 13
> 2  66 18
> 3   8 12
> 4 100 24

edited May 07 '16 at 15:20

answered May 07 '16 at 13:26

rafa.pereira

13,251
6
71
109

score 3 · Answer 3 · answered Aug 14 '18 at 19:56

3

rowSums(cbind(mydata$variable1, mydata$variable2, mydata$variable3), na.rm = T )

answered Aug 14 '18 at 19:56

useR

179
3
13

score 2 · Answer 4 · edited May 08 '16 at 11:22

2

OK, if you want a separate dataframe:

> data.frame(X1=rowSums(data[,2:4]), X2=rowSums(data[,5:7]))

edited May 08 '16 at 11:22

David Arenburg

91,361
17
137
196

answered May 07 '16 at 11:42

Juanjo

58
5

this answer does not return the desired output in `data.frame` – rafa.pereira May 07 '16 at 13:28
1

I've never said it does. I've said you need to use rowSums(data[,c(2:4,5:7)]) instead of rowSums(data[,2:4][,5:7]). If you want a dataframe you just need to combine it. – Juanjo May 07 '16 at 15:12
I still think your answer does not return the result asked for in the questions, which would be a `data.frame` like the ones you can see in the other answers – rafa.pereira May 07 '16 at 15:18
Are you sure?. Check the result, it's a data.frame – skan May 07 '16 at 15:23
Yes, it's now returning a data.frame, but it's still not returning the desired output indicated in the question. – rafa.pereira May 08 '16 at 09:07

score 1 · Answer 5 · answered May 07 '16 at 06:49

1

Specifying the two summations explicitly:

cbind(x1=rowSums(data[,c('v1','v2','v3')]),x2=rowSums(data[,c('v4','v5','v6')]));
##       x1 x2
## [1,]  14 13
## [2,]  66 18
## [3,]   8 12
## [4,] 100 24

answered May 07 '16 at 06:49

bgoldst

34,190
6
38
64

actual data set has got large number of variables. – sriya May 07 '16 at 12:05

score 0 · Answer 6 · answered May 07 '16 at 06:52

0

We can split the dataset into a list and then use Reduce with f="+".

sapply(split.default(data[-1], rep(paste0("x", 1:2), each=3)), Reduce, f=`+`)
#     x1 x2
#[1,]  14 13
#[2,]  66 18
#[3,]   8 12
#[4,] 100 24

answered May 07 '16 at 06:52

akrun

874,273
37
540
662

All answers work nicely. Thanks . @Akrun what does Reduce, f='+' do here please? – sriya May 07 '16 at 12:05
@Lio `f` is the function call inside the `Reduce`. It does the sum by each element in a particular row of the data.frame. – akrun May 07 '16 at 12:45

score 0 · Answer 7 · answered Mar 04 '21 at 22:48

So, I came across a similar problem

I have the same survey of 20 questions given 2 different times, so there are 2 different survey scores, for a total of 40 columns. Each survey question ends with an identifier. So for example, the first question of the survey is distinguished by adding .a or .c:

Survey1Question1.a
Survey1Question1.c

Say your data is in df1 and you want to sum all of the columns within each survey so that you have 2 survey scores:

df1 %>% mutate(Survey.A = rowSums(.[grepl('\\.a$',colnames(.))]),
        Survey2 = rowSums(.[grepl('\\.c$',colnames(.))]),
        )

# A tibble: 9 x 2
  Survey.A Survey.C
     <dbl>   <dbl>
1       64      51
2       89      91
3       62      60
4       80      80
5       66      69
6       60      61
7       71      74
8       52      50
9       79      69

I'm just learning how to use the '.' dot notation. But I believe this works because rowSums is expecting a dataframe. Which means you can follow Technophobe1's answer above. But the trick then becomes how can you do that programmatically.

Well, the first '.' in rowSums is the full set of columns/variables in the data set passed by the pipe (df1). But you want to subset that.

So, here is where grepl works well. You can subset a dataframe using grepl using the following syntax: dataframe[,grepl("pattern",colnames(dataframe))]

So, in my code above rowSums(.[grepl('\\.a$',colnames(.))]) the trick is substituting 'dataframe' with the '.' dot notation.

How to get rowSums for selected columns in R

7 Answers7

Linked

Related