5

I am a newbie to R and seek help to calculate sums of selected column for each row. My simple data frame is as below.

data = data.frame(location = c("a","b","c","d"),
            v1 = c(3,4,3,3), v2 = c(4,56,3,88), v3 =c(7,6,2,9), v4=c(7,6,1,9),
            v5 =c(4,4,7,9), v6 = c(2,8,4,6))

I want sum of columns V1 to V3 and V4 to V6 for my each row in a new data frame.

   x1   x2
a  14   13   
b  66   18
c
d

I did something like below.

rowSums(data[,2:4][,5:7])

But something should be wrong in my codes. Thanks in advance for any help.

sriya
  • 179
  • 1
  • 2
  • 7

7 Answers7

9

My sense would be to use dply:

require(dply)
data %>% mutate(v2v4 = rowSums(.[2:4])) %>% mutate(v4v6 = rowSums(.[5:7])) %>% select(-(location:v6))

result:

> newDf <- data %>% mutate(v2v4 = rowSums(.[2:4])) %>% mutate(v4v6 = rowSums(.[5:7])) %>% select(-(location:v6))
> newDf
  v2v4 v4v6
1   14   13
2   66   18
3    8   12
4  100   24
Technophobe01
  • 8,212
  • 3
  • 32
  • 59
3

Here is a quite simple solution using apply.

output <- data.frame( x1 = apply(data[2:4], 1, sum) ,
                      x2 = apply(data[5:7], 1, sum) )

result:

output
>    x1 x2
> 1  14 13
> 2  66 18
> 3   8 12
> 4 100 24
rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
3
rowSums(cbind(mydata$variable1, mydata$variable2, mydata$variable3), na.rm = T )
useR
  • 179
  • 3
  • 13
2

OK, if you want a separate dataframe:

> data.frame(X1=rowSums(data[,2:4]), X2=rowSums(data[,5:7]))
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Juanjo
  • 58
  • 5
  • this answer does not return the desired output in `data.frame` – rafa.pereira May 07 '16 at 13:28
  • 1
    I've never said it does. I've said you need to use rowSums(data[,c(2:4,5:7)]) instead of rowSums(data[,2:4][,5:7]). If you want a dataframe you just need to combine it. – Juanjo May 07 '16 at 15:12
  • I still think your answer does not return the result asked for in the questions, which would be a `data.frame` like the ones you can see in the other answers – rafa.pereira May 07 '16 at 15:18
  • Are you sure?. Check the result, it's a data.frame – skan May 07 '16 at 15:23
  • Yes, it's now returning a data.frame, but it's still not returning the desired output indicated in the question. – rafa.pereira May 08 '16 at 09:07
1

Specifying the two summations explicitly:

cbind(x1=rowSums(data[,c('v1','v2','v3')]),x2=rowSums(data[,c('v4','v5','v6')]));
##       x1 x2
## [1,]  14 13
## [2,]  66 18
## [3,]   8 12
## [4,] 100 24
bgoldst
  • 34,190
  • 6
  • 38
  • 64
0

We can split the dataset into a list and then use Reduce with f="+".

sapply(split.default(data[-1], rep(paste0("x", 1:2), each=3)), Reduce, f=`+`)
#     x1 x2
#[1,]  14 13
#[2,]  66 18
#[3,]   8 12
#[4,] 100 24
akrun
  • 874,273
  • 37
  • 540
  • 662
  • All answers work nicely. Thanks . @Akrun what does Reduce, f='+' do here please? – sriya May 07 '16 at 12:05
  • @Lio `f` is the function call inside the `Reduce`. It does the sum by each element in a particular row of the data.frame. – akrun May 07 '16 at 12:45
0

So, I came across a similar problem

I have the same survey of 20 questions given 2 different times, so there are 2 different survey scores, for a total of 40 columns. Each survey question ends with an identifier. So for example, the first question of the survey is distinguished by adding .a or .c:

Survey1Question1.a
Survey1Question1.c

Say your data is in df1 and you want to sum all of the columns within each survey so that you have 2 survey scores:

df1 %>% mutate(Survey.A = rowSums(.[grepl('\\.a$',colnames(.))]),
        Survey2 = rowSums(.[grepl('\\.c$',colnames(.))]),
        )

# A tibble: 9 x 2
  Survey.A Survey.C
     <dbl>   <dbl>
1       64      51
2       89      91
3       62      60
4       80      80
5       66      69
6       60      61
7       71      74
8       52      50
9       79      69

I'm just learning how to use the '.' dot notation. But I believe this works because rowSums is expecting a dataframe. Which means you can follow Technophobe1's answer above. But the trick then becomes how can you do that programmatically.

Well, the first '.' in rowSums is the full set of columns/variables in the data set passed by the pipe (df1). But you want to subset that.

So, here is where grepl works well. You can subset a dataframe using grepl using the following syntax: dataframe[,grepl("pattern",colnames(dataframe))]

So, in my code above rowSums(.[grepl('\\.a$',colnames(.))]) the trick is substituting 'dataframe' with the '.' dot notation.

Brian Holt
  • 75
  • 9