0

I have a data frame with two columns, a Ref_Date column and a Value column. The date column contains 12 rows for each year, from 1988 until 2015. What I need to do is to group by the year only and summarize the Value column so that I can get only one row for each year containing the sum of all values for each of the 12 months of that year:

row.names   Ref_Date    Value
166483      1989/01     713
166484      1989/02     771
166485      1989/03     565
166486      1989/04     1248
166487      1989/05     1380
166488      1989/06     1118
166489      1989/07     1026
166490      1989/08     995
166491      1989/09     835
166492      1989/10     939
166493      1989/11     878
166494      1989/12     1075
166495      1990/01     878
166496      1990/02     563
166497      1990/03     773
166498      1990/04     1131
166499      1990/05     1562
166500      1990/06     1747
166501      1990/07     1258
166502      1990/08     791
Jean-François Beaulieu
  • 4,305
  • 22
  • 74
  • 107
  • That's a good question. – Jean-François Beaulieu Feb 23 '15 at 21:21
  • 3
    It is getting downvoted (didn't downvote yet, but have an irresistible urge to do so) because we would expect posting an image from a new user, not someone with your experience on the site. How are we supposed to reproduce this? By writing every single value by hand? Please follow the guidelines in [this link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – David Arenburg Feb 23 '15 at 21:22
  • 2
    Probably because it doesn't demonstrate research effort and because the data is only presented as image (i.e. not reproducible). – talat Feb 23 '15 at 21:23
  • 2
    @all of you, I have posted the data in plain text. Sorry about that, – Jean-François Beaulieu Feb 23 '15 at 21:27
  • 1
    This is really two questions - how do i extract just the year from a string, and how do I get the mean by group. Both of those have many duplicates on this site - i think that's the bigger issue for downvoters? – Señor O Feb 23 '15 at 21:30

2 Answers2

2

You can use the following code with dplyr:

library(dplyr)
df %>% 
  group_by(year = substr(Ref_Date, 1, 4)) %>%     # create the groups
  summarise(Value = sum(Value))

#Source: local data frame [2 x 2]
#
#  year Value
#1 1989 11543
#2 1990  8703

Or similarly with data.table package

library(data.table)
setDT(df)[, sum(Value), by = .(year = substr(Ref_Date, 1, 4))]
#   year    V1
#1: 1989 11543
#2: 1990  8703

Or with base R

with(df, aggregate(Value ~ cbind(year = substr(Ref_Date, 1, 4)), FUN = sum))
#  year Value
#1 1989 11543
#2 1990  8703
talat
  • 68,970
  • 21
  • 126
  • 157
1

Another answer could be the following (by using tapply):

years <- 1988:2015 ## or first.year:last.year
sums <- tapply(df$Value, substr(df$Ref_Date, 1, 4)), sum)
new.df <- data.frame(years = years, sums = sums)

EDIT: Just a more general solution to avoid standard dates (but it's basically similar to the one posted above):

years <- substr(df$Ref_Date, 1, 4)
sums <- tapply(df$Value, years, sum)
new.df <- data.frame(years = unique(years), sum = sums)
codingEnthusiast
  • 3,800
  • 2
  • 25
  • 37