0

I have two columns. One that has several duplicate values (col A) (like 10, 10, 20, 5, 10, 20, etc...). The other (col B) is a binary (0/1) variable. I need to get R to first sort the first column A, if necessary, and then look at all the duplicate values, and their corresponding values in the second column, B. Then, for each set of duplicate values in col A, I need to sum the values in col B. So, if there are 5 10s in col A, then I need to sum the 1s in col B associated with each of these 5 10s.

How do I do this?

Thanks.

user2714330
  • 123
  • 1
  • 2
  • 7
  • 3
    Please take the time to create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Include sample input and desired output. Show any code that you may have tried so far and describe exactly where you are getting stuck. – MrFlick Jun 19 '15 at 15:24

2 Answers2

3

You want an aggregation:

aggregate(B~A, df, FUN=sum)
Neal Fultz
  • 9,282
  • 1
  • 39
  • 60
0
df = data.frame(A = c(5,10, 5, 10), B=c(0,1,1,1))
tapply(df$B, df$A, sum)
#  5 10 
#  1  2 

the solution by Neal presents the result in a nicer way:

aggregate(B~A, df, FUN=sum)
#    A B
# 1  5 1
# 2 10 2
mts
  • 2,160
  • 2
  • 24
  • 34