17

I have a data frame like this:

     id  no  age
1    1   7   23
2    1   2   23
3    2   1   25
4    2   4   25
5    3   6   23
6    3   1   23

and I hope to aggregate the date frame by id to a form like this: (just sum the no if they share the same id, but keep age there)

    id  no  age
1    1   9   23
2    2   5   25
3    3   7   23

How to achieve this using R?

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Nip
  • 359
  • 1
  • 3
  • 9

3 Answers3

25

Assuming that your data frame is named df.

aggregate(no~id+age, df, sum)
#   id age no
# 1  1  23  9
# 2  3  23  7
# 3  2  25  5
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • 3
    This does not work as you add more columns, it treats every column on the RHS as a column of factors to group by, so for example if you include an extra column with a not for each entry, you end up with many duplicate rows as a result – user5359531 Nov 21 '17 at 15:36
  • 2
    What if age is not consistent for each id and we just want the first age? – Wei Nov 13 '18 at 19:50
7

Even better, data.table:

library(data.table)
# convert your object to a data.table (by reference) to unlock data.table syntax
setDT(DF)
DF[  , .(sum_no = sum(no), unq_age = unique(age)), by = id]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • And if you haven't declared it as a `data.table` (as is likely if you're just now calling `library(data.table)`), you can use `setDT(dt)[,list(sum(no),unique(age)),by=id]`. I like `data.table`, but I don't know if I'd say this is "Even better" than the equally concise answers above :) – Frank Feb 17 '15 at 21:12
  • 2
    fair, but I mean even better in the sense that this may get you to start using `data.table`, the dividends if which are immeasurable ;-) – MichaelChirico Feb 17 '15 at 21:23
  • Why do you use `unique(age)` here? Why not just `dt[, sum(no), .(id, age)]` ? You have way too many of unnecessary keystrokes here – David Arenburg Feb 18 '15 at 15:32
  • `dt[, sum(no), .(id, age)]` does not work for me; `Error ... could not find function "."` – user5359531 Nov 21 '17 at 15:19
  • also, this does not work if you want to group `by` multiple columns, and also include multiple other columns in the output – user5359531 Nov 21 '17 at 15:37
  • @user5359531 make sure to follow the `setDT` step. regarding your second point -- it _does_ work. but I don't know what specifically you're after. – MichaelChirico Nov 21 '17 at 15:57
  • I created a new question to illustrate the problem here: https://stackoverflow.com/questions/47418127/r-how-to-aggregate-some-columns-while-keeping-other-columns – user5359531 Nov 21 '17 at 16:36
4

Alternatively, you could use ddply from plyr package:

require(plyr)
ddply(df,.(id,age),summarise,no = sum(no))

In this particular example the results are identical. However, this is not always the case, the difference between the both functions is outlined here. Both functions have their uses and are worth exploring, which is why I felt this alternative should be mentioned.

Community
  • 1
  • 1
Maxim.K
  • 4,120
  • 1
  • 26
  • 43