Calculate sum by grouping by column value in R

Question

I have a data frame with two columns, a Ref_Date column and a Value column. The date column contains 12 rows for each year, from 1988 until 2015. What I need to do is to group by the year only and summarize the Value column so that I can get only one row for each year containing the sum of all values for each of the 12 months of that year:

row.names   Ref_Date    Value
166483      1989/01     713
166484      1989/02     771
166485      1989/03     565
166486      1989/04     1248
166487      1989/05     1380
166488      1989/06     1118
166489      1989/07     1026
166490      1989/08     995
166491      1989/09     835
166492      1989/10     939
166493      1989/11     878
166494      1989/12     1075
166495      1990/01     878
166496      1990/02     563
166497      1990/03     773
166498      1990/04     1131
166499      1990/05     1562
166500      1990/06     1747
166501      1990/07     1258
166502      1990/08     791

It is getting downvoted (didn't downvote yet, but have an irresistible urge to do so) because we would expect posting an image from a new user, not someone with your experience on the site. How are we supposed to reproduce this? By writing every single value by hand? Please follow the guidelines in [this link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — David Arenburg, Feb 23 '15 at 21:22
Probably because it doesn't demonstrate research effort and because the data is only presented as image (i.e. not reproducible). — talat, Feb 23 '15 at 21:23
@all of you, I have posted the data in plain text. Sorry about that, — Jean-François Beaulieu, Feb 23 '15 at 21:27
This is really two questions - how do i extract just the year from a string, and how do I get the mean by group. Both of those have many duplicates on this site - i think that's the bigger issue for downvoters? — Señor O, Feb 23 '15 at 21:30

talat · Accepted Answer · 2015-02-23T22:46:29.070

2

You can use the following code with dplyr:

library(dplyr)
df %>% 
  group_by(year = substr(Ref_Date, 1, 4)) %>%     # create the groups
  summarise(Value = sum(Value))

#Source: local data frame [2 x 2]
#
#  year Value
#1 1989 11543
#2 1990  8703

Or similarly with data.table package

library(data.table)
setDT(df)[, sum(Value), by = .(year = substr(Ref_Date, 1, 4))]
#   year    V1
#1: 1989 11543
#2: 1990  8703

Or with base R

with(df, aggregate(Value ~ cbind(year = substr(Ref_Date, 1, 4)), FUN = sum))
#  year Value
#1 1989 11543
#2 1990  8703

edited Feb 23 '15 at 22:46

answered Feb 23 '15 at 21:12

talat

68,970
21
126
157

1

+1 Beat me to it, although it may be more helpful to beginners to separate the mutation statement into its own logic -- that is `mutate(year = substr(Ref_Date, 1, 4))` followed by `group_by(year)` – JasonAizkalns Feb 23 '15 at 21:15
COuld you please write this statement again but without using pipes ? – Jean-François Beaulieu Feb 23 '15 at 21:18
@JFBeaulieu, I believe that would be a good exercise for you. I already spoon-fed you the code – talat Feb 23 '15 at 21:21

codingEnthusiast · Answer 2 · 2015-02-23T21:43:48.373

1

Another answer could be the following (by using tapply):

years <- 1988:2015 ## or first.year:last.year
sums <- tapply(df$Value, substr(df$Ref_Date, 1, 4)), sum)
new.df <- data.frame(years = years, sums = sums)

EDIT: Just a more general solution to avoid standard dates (but it's basically similar to the one posted above):

years <- substr(df$Ref_Date, 1, 4)
sums <- tapply(df$Value, years, sum)
new.df <- data.frame(years = unique(years), sum = sums)

edited Feb 23 '15 at 21:43

answered Feb 23 '15 at 21:26

codingEnthusiast

3,800
2
25
37

It worked for me... Just needed to change the second line to: sums <- tapply(as.numeric(df$Value), substr(df$Ref_Date, 1, 4), sum) – Jean-François Beaulieu Feb 23 '15 at 21:31
Oh, I'm glad it did, I had no idea you had stored the values as strings. But It's OK in the end, I guess. – codingEnthusiast Feb 23 '15 at 21:35
This basically @docendos solution. Should be a comment at best, but whatever. – David Arenburg Feb 23 '15 at 21:37
As soon as docendo's answer was posted, the OP asked for a version without pipes and that's what I provided. I had no intention of answering in the first place, since I thought that answer was enough. – codingEnthusiast Feb 23 '15 at 21:39
1

You can unpipe the `dplyr` solution, for example `summarise(group_by(df, year = substr(Ref_Date, 1, 4)), Value = sum(Value))` – David Arenburg Feb 23 '15 at 21:40

Calculate sum by grouping by column value in R

2 Answers2