1

I have multiple variables to sum by group. The names of the variables have the exact same first character and ended from 1 to n. All variables to sum are side by side in data frame. All I could find is:

id<-1:nrow(df)
n<-length(id)
data2<-aggregate(cbind(vol_1,vol_2,vol_3,vol_4,vol_5,vol_6,vol_7,vol_8,vol_9,vol_10)~group,data=data1,sum,na.rm=T)

How can I be efficient knowing that n can change next time?

  • Hi, welcome to SO. Please ensure to post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including your data. Its much easier for us if we can run your code, but we dont know what `df` or `vol_1` is – Conor Neilson Mar 19 '20 at 14:04
  • In this example, n=6 and data1 is : d1<-cbind(letters[1:4],1:4,9:6,10:7,10:7,1:4,5:8,1:4,10:7,9:6,2:5) d2<-cbind(letters[1:4],5:8,4:1,1:4,9:6,10:7,1:4,5:8,1:4,9:6,10:7) d3<-cbind(letters[1:4],9:6,3:6,10:7,1:4,9:6,8:5,1:4,5:8,9:6,5:8) data1<-as.data.frame(rbind(d1,d2,d3)) colnames(data1)<-c("group","vol_1","vol_2","vol_3","vol_4","vol_5","vol_6","vol_7","vol_8","vol_9","vol_10") – Hubert Chicoine Mar 19 '20 at 14:47
  • Does this answer your question? [How to sum a variable by group](https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group) – MatthewR Mar 19 '20 at 14:52

3 Answers3

1

You can also use data table

library(data.table)
dt <- data.table(df)
dt[,.(sum(vol_1),sum(vol_2)),by=.(group)]
greg5678
  • 141
  • 5
0

It seems like you're looking for rowSums

You could try:

vars <- c("vol_1","vol_2","vol_3","vol_4","vol_5","vol_6","vol_7","vol_8","vol_9","vol_10")

as.matrix(vars)

rowSums(vars)
Matt
  • 7,255
  • 2
  • 12
  • 34
0

If the variables all start with the same character, say "v", as you say, then the summarise_at function from the dplyr package comes in handy:

library(dplyr)
df %>%
  group_by(group) %>%
  summarise_at(vars(starts_with("v")), sum)

# A tibble: 2 x 4
  group vol_1 vol_2 vol_3
  <fct> <int> <int> <int>
1 1        29    27    24
2 2        26    28    31

This gives the same result as your aggregate command.


Data:

set.seed(123)
df <- data.frame(id=1:10, group=gl(2, 5),
                 vol_1=sample(10), vol_2=sample(10), vol_3=sample(10))
df
   id group vol_1 vol_2 vol_3
1   1     1     3    10     8
2   2     1    10     5     7
3   3     1     2     3     2
4   4     1     8     8     1
5   5     1     6     1     6
6   6     2     9     4     3
7   7     2     1     6     4
8   8     2     7     9    10
9   9     2     5     7     9
10 10     2     4     2     5
Edward
  • 10,360
  • 2
  • 11
  • 26