How to aggregate some columns while keeping other columns in R?

Question

I have a data frame like this:

     id  no  age
1    1   7   23
2    1   2   23
3    2   1   25
4    2   4   25
5    3   6   23
6    3   1   23

and I hope to aggregate the date frame by id to a form like this: (just sum the no if they share the same id, but keep age there)

    id  no  age
1    1   9   23
2    2   5   25
3    3   7   23

How to achieve this using R?

score 25 · Accepted Answer · edited Nov 17 '16 at 23:11

25

Assuming that your data frame is named df.

aggregate(no~id+age, df, sum)
#   id age no
# 1  1  23  9
# 2  3  23  7
# 3  2  25  5

edited Nov 17 '16 at 23:11

MichaelChirico

33,841
14
113
198

answered Apr 12 '13 at 19:18

Didzis Elferts

95,661
14
264
201

3

This does not work as you add more columns, it treats every column on the RHS as a column of factors to group by, so for example if you include an extra column with a not for each entry, you end up with many duplicate rows as a result – user5359531 Nov 21 '17 at 15:36
2

What if age is not consistent for each id and we just want the first age? – Wei Nov 13 '18 at 19:50

MichaelChirico · Answer 2 · 2018-04-24T09:18:59.657

7

Even better, data.table:

library(data.table)
# convert your object to a data.table (by reference) to unlock data.table syntax
setDT(DF)
DF[  , .(sum_no = sum(no), unq_age = unique(age)), by = id]

edited Apr 24 '18 at 09:18

answered Feb 17 '15 at 21:06

MichaelChirico

33,841
14
113
198

And if you haven't declared it as a `data.table` (as is likely if you're just now calling `library(data.table)`), you can use `setDT(dt)[,list(sum(no),unique(age)),by=id]`. I like `data.table`, but I don't know if I'd say this is "Even better" than the equally concise answers above :) – Frank Feb 17 '15 at 21:12
2

fair, but I mean even better in the sense that this may get you to start using `data.table`, the dividends if which are immeasurable ;-) – MichaelChirico Feb 17 '15 at 21:23
Why do you use `unique(age)` here? Why not just `dt[, sum(no), .(id, age)]` ? You have way too many of unnecessary keystrokes here – David Arenburg Feb 18 '15 at 15:32
`dt[, sum(no), .(id, age)]` does not work for me; `Error ... could not find function "."` – user5359531 Nov 21 '17 at 15:19
also, this does not work if you want to group `by` multiple columns, and also include multiple other columns in the output – user5359531 Nov 21 '17 at 15:37
@user5359531 make sure to follow the `setDT` step. regarding your second point -- it _does_ work. but I don't know what specifically you're after. – MichaelChirico Nov 21 '17 at 15:57
I created a new question to illustrate the problem here: https://stackoverflow.com/questions/47418127/r-how-to-aggregate-some-columns-while-keeping-other-columns – user5359531 Nov 21 '17 at 16:36

score 4 · Answer 3 · edited May 23 '17 at 11:59

4

Alternatively, you could use ddply from plyr package:

require(plyr)
ddply(df,.(id,age),summarise,no = sum(no))

In this particular example the results are identical. However, this is not always the case, the difference between the both functions is outlined here. Both functions have their uses and are worth exploring, which is why I felt this alternative should be mentioned.

edited May 23 '17 at 11:59

Community

1
1

answered Apr 12 '13 at 21:59

Maxim.K

4,120
1
26
43

How to aggregate some columns while keeping other columns in R?

3 Answers3

Linked