14

Hello and thank you in advance for your assistance,

(PLEASE Note Comments section for additional insight: i.e. the cost column in the example below was added to this question; Simon, provides a great answer, but the cost column itself is not represented in the data response from him, although the function he provides works with the cost column)

I have a data set, lets call it 'data' which looks like this

NAME     DATE     COLOR   PAID    COST
Jim      1/1/2013 GREEN   150     100
Jim      1/2/2013 GREEN   50      25
Joe      1/1/2013 GREEN   200     150
Joe      1/2/2013 GREEN   25      10

What I would like to do is sum the PAID (and COST) elements of the records with the same NAME value and reduce the number of rows (as in this example) to 2, such that my new data frame looks like this:

NAME     DATE     COLOR   PAID    COST
Jim      1/2/2013 GREEN   200     125
Joe      1/2/2013 GREEN   225     160

As far as the dates are concerned, I don't really care about which one survives the summation process.

I've gotten as far as rowSums(data), but I'm not exactly certain how to use it. Any help would be greatly appreciated.

Jonathan Charlton
  • 1,975
  • 6
  • 23
  • 30

1 Answers1

21

aggregate is the function you are looking for:

aggregate( cbind( PAID , COST ) ~ NAME + COLOR , data = data , FUN = sum )
# NAME PAID
# 1  Jim  200
# 2  Joe  225
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • Thank you Simon. I really appreciate it. I was wondering if there would be any way to retain the "COLOR" field with each of these rows in the aggregate. – Jonathan Charlton Apr 10 '13 at 19:04
  • Sure, I'll update the answer. Are all your colour fields the same for each person? If not do you want the most common? – Simon O'Hanlon Apr 10 '13 at 19:05
  • So for each person, all of their colors would remain constant through the records. For instance, Jim will always have Green, and Joe may have red, but Joe's records will always be red. One last aspect of this question. If I were to have another numeric data field in addition to PAID (i.e. COST), how might I sum that field in like manner as well? Thank you so very much. – Jonathan Charlton Apr 10 '13 at 19:07
  • @JonathanRossCharlton see the edits. You can use cbind to add another grouping column, e.g. cost, and just add COLOR to the LHS side of the formula. Note that if someone where to have two colors in their name you would get two records for that person - one for each colour. – Simon O'Hanlon Apr 10 '13 at 19:11
  • You're awesome Simon. It's really nice people like you who make learning R feasible for the rest of us, and I really do appreciate it! – Jonathan Charlton Apr 10 '13 at 19:11
  • @JonathanRossCharlton lol! Thanks very much! glad to be of some small service Cheers. :-) – Simon O'Hanlon Apr 10 '13 at 19:12
  • 2
    @JonathanRossCharlton, FYI, it is helpful to *not* add too many details in the *comments* to an answer, especially if you don't update your question to reflect these new details. As it is, someone visiting SO to get help with a similar question might be confused by SimonO101's answer, since your question doesn't mention anything about a "COST" variable. Placing that additional request in the question instead of comments will increase it's visibility and reduce confusion for other visitors. – A5C1D2H2I1M1N2O1R2T1 Apr 10 '13 at 19:43