2

I have a somewhat dumb R question. If I have a matrix (or dataframe, whichever is easier to work with) like:

Year  Match
2008   1808
2008 137088
2008      1
2008  56846
2007   2704
2007 169876
2007  75750
2006   2639
2006 193990
2006      2

And I wanted to sum each of the match counts for the years (so, e.g. the 2008 row would be 2008 195743, how would I go about doing this? I've got a few solutions in my head but they are all needlessly complicated and R tends to have some much easier solution tucked away somewhere.

You can generate the same matrix above with:

structure(c(2008L, 2008L, 2008L, 2008L, 2007L, 2007L, 2007L, 
2006L, 2006L, 2006L, 1808L, 137088L, 1L, 56846L, 2704L, 169876L, 
75750L, 2639L, 193990L, 2L), .Dim = c(10L, 2L), .Dimnames = list(
NULL, c("Year", "Match")))

Thanks for any help you can offer.

eli-k
  • 10,898
  • 11
  • 40
  • 44
Adam Hyland
  • 878
  • 1
  • 9
  • 21

3 Answers3

5

aggregate(x = df$Match, by = list(df$Year), FUN = sum), assuming df is your data frame above.

Jubbles
  • 4,450
  • 8
  • 35
  • 47
3

You may also want to use 'ddply' function from 'plyr' package.

# install plyr package
install.packages('plyr')
library(plyr)
# creating your data.frame
foo <- as.data.frame(structure(c(2008L, 2008L, 2008L, 2008L, 2007L, 2007L, 2007L, 
            2006L, 2006L, 2006L, 1808L, 137088L, 1L, 56846L, 2704L, 169876L, 
            75750L, 2639L, 193990L, 2L), .Dim = c(10L, 2L), .Dimnames = list(
              NULL, c("Year", "Match"))))

# here's what you're looking for
ddply(foo,.(Year),numcolwise(sum))

  Year  Match
1 2006 196631
2 2007 248330
3 2008 195743

By the way, the total sum for 2008 should be 195743 (1808+137088+1+56846) instead of 138897 you forgot add 56846 up.

n8sty
  • 1,418
  • 1
  • 14
  • 26
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
3

As it is explained above, you can use aggregate to do it as follows. but in a much simpler way

aggregate(. ~ Year, df, sum)
#  Year  Match
#1 2006 196631
#2 2007 248330
#3 2008 195743

You can also use the Dplyr to solve this as follows

library(dplyr)
df %>% group_by(Year) %>% summarise(Match = sum(Match))
#  Year  Match
#  (int)  (int)
#1  2008 195743
#2  2007 248330
#3  2006 196631