0

I would like to sum equal values in a given data set. Unfortunately I do not really have a clue where to begin with, especially which function to use. Lets say I have a data frame like this

count<- c(1,4,7,3,7,9,3,4,2,8)
clone<- c("aaa","aaa","aaa","bbb","aaa","aaa","aaa","bbb","ccc","aaa")
a<- c("d","e","k","v","o","s","a","y","q","f")
b<- c("g","e","j","v","i","q","a","x","l","p")
test<-data.frame(count,clone,a,b)

Problem is that there are lots of repetitive single values wich need to be combined in one (all the "aaa" and the two "bbb"). So I would like to aggregate(?) all equal values in column "clone", summing up the "count" values taking the value for "a" and "b" from the clone with the highest count.

My final data set should look like:

count<- c(39,7,2)
clone<- c("aaa","bbb","ccc")
a<- c("s","y","q")
b<- c("q","x","l")
test<-data.frame(count,clone)

Do you have a suggestion which function I could use for that? Thanks alot in advance.

EDIT: Sorry, I was too tired and forgot to put in the "a" and "b" cloumn, which makes quite a difference since aggregating just after clone and count drops these two columns with essential information, I need in my final data set.

ben
  • 639
  • 1
  • 7
  • 12

4 Answers4

4

Use aggregate

> aggregate(count~clone, FUN=sum, data=test)
  clone count
1   aaa    39
2   bbb     7
3   ccc     2

Also see this answer for further alternatives.

Community
  • 1
  • 1
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
  • Thanks Jilber for the link! I didnt ask correctly and left out an important part. I've edited my question and would be very happy about a hint ;) – ben Jun 18 '14 at 09:01
  • okay this was rather easy, I just had to add the columns into the aggregate expression like this: > aggregate(count~clone+a+b, FUN=sum, data=test)! – ben Jun 18 '14 at 10:23
2

This can be handled with tapply:

tapply(count, clone, sum)
# aaa bbb ccc 
#  39   7   2 
josliber
  • 43,891
  • 12
  • 98
  • 133
1

You can also do this with ddply from plyr

library(plyr)
ddply(test,.(clone),function(x) sum(x$count))
Max Candocia
  • 4,294
  • 35
  • 58
1

A dplyr solution:

library('dplyr')
summarize(group_by(test, clone), count = sum(count))
Kara Woo
  • 3,595
  • 19
  • 31