Aggregating rows for multiple columns in R

Question

I have a df that basically looks like this:

Id  A    B    C total
3   5    0    1  6
3   4    3    4   11
3   2    1    2    5
4   5    4    3   12
4   3    2    4    9
4   1    1    1    3

I want to collapse the rows by Id and get:

Id   A    B    C    total
3    11   4    7     22
4    9    7    8   24

I was able to do so for one column with:

df.grouped<- aggregate(df$A~Id, data=df, FUN="sum")

I have many columns (A-Z), so I need some kind of loop. I tried:

df.grouped<- aggregate(df[4:51]~Id, data=df, FUN="sum")
names(df.grouped)<-paste(names(df)[4:51])

But got:

Error in model.frame.default(formula = df[4:51] ~ Id, data = df) : 
invalid type (list) for variable 'df[4:51]'

As you can see, I also want the names in df.grouped to be the same as in df.

Any ideas will be very helpful

Thanks

If the answer worked for you, it would be appreciated if you accept the answer. This will give future readers a clue about the value of the solution. See also this help page: [What should I do when someone answers my question?](http://stackoverflow.com/help/someone-answers) — Jaap, Oct 10 '15 at 11:09
@user3315563 To which answer are you referring? Why don't you follow Jaap's above advice? This is the best way to reward people who take time to answer to your questions! — Gilbert, Apr 06 '18 at 07:11

akrun · Accepted Answer · 2015-10-04T17:30:52.327

We can use the formula method of aggregate. By specifying . on the LHS of ~, we select all the columns except the 'Id' column.

aggregate(.~Id, df, sum)
#   Id  A B C total
#1  3 11 4 7    22
#2  4  9 7 8    24

Or we can also specify the columns without using the formula method

aggregate(df[2:ncol(df)],df['Id'], FUN=sum)
#  Id  A B C total
#1  3 11 4 7    22
#2  4  9 7 8    24

Other options include dplyr and data.table.

Using dplyr, we group by 'Id' and get the sum of all columns with summarise_each.

library(dplyr)
df %>%
  group_by(Id) %>%
  summarise_each(funs(sum))

Or with data.table, we convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'Id', we loop (lapply(..) through the Subset of Data.table (.SD) and get the sum.

library(data.table)
setDT(df)[, lapply(.SD, sum), by = Id]

Teng L · Answer 2 · 2017-08-01T20:48:27.113

Use summaryBy from the doBy library:

library(doBy)
df.grouped = summaryBy(A + B + C + total ~ Id, data = df, FUN = sum,
    keep.names = TRUE)

You may omit FUN = sum as it is the default. When calculating length (count), mean, or variance, you can replace is with FUN = length or FUN = mean and so on.

Since you are also selecting every column of the data frame, you may use . ~ Id as a shorthand form.

An example that works right out of the box with R:

head(mtcars) library(doBy) summaryBy(. ~ cyl, data = mtcars, FUN = mean, keep.names = TRUE)

cyl      mpg     disp        hp     drat       wt     qsec        vs        am     gear     carb
1   4 26.66364 105.1364  82.63636 4.070909 2.285727 19.13727 0.9090909 0.7272727 4.090909 1.545455
2   6 19.74286 183.3143 122.28571 3.585714 3.117143 17.97714 0.5714286 0.4285714 3.857143 3.428571
3   8 15.10000 353.1000 209.21429 3.229286 3.999214 16.77214 0.0000000 0.1428571 3.285714 3.500000

score 1 · Answer 3 · answered Aug 11 '17 at 09:17

Create df

        df <- data.frame(Id = c(3,3,3,4,4,4), 
                         A = c(5,4,2,5,3,1),   
                         B = c(0,3,1,4,2,1),   
                         C = c(1,4,2,3,4,1))

library(plyr)

req <- ddply( df, .(Id),
           summarise,
           A  = sum(A),
           B  = sum(B),
           C  = sum(C))

View(req)

Aggregating rows for multiple columns in R

3 Answers3

Linked

Related