4

I have daily data for 7 years. I want to group this into weekly data (based on the actual date) and sum the frequency.

Date Frequency
1   2014-01-01  179
2   2014-01-02  82  
3   2014-01-03  89  
4   2014-01-04  109 
5   2014-01-05  90  
6   2014-01-06  66  
7   2014-01-07  75  
8   2014-01-08  106 
9   2014-01-09  89  
10  2014-01-10  82

What is the best way to achieve that? Thank you

J_F
  • 9,956
  • 2
  • 31
  • 55
aua
  • 97
  • 1
  • 7
  • Does this answer your question? [Convert daily to weekly/monthly data with R](https://stackoverflow.com/questions/16442396/convert-daily-to-weekly-monthly-data-with-r) – Rohit Mar 01 '20 at 13:16

5 Answers5

8

These solutions all use base R and differ only in the definition and labelling of weeks.

1) cut the dates into weeks and then aggregate over those. Weeks start on Monday but you can add start.on.monday=FALSE to cut to start them on Sunday if you prefer.

Week <- as.Date(cut(DF$Date, "week"))
aggregate(Frequency ~ Week, DF, sum)
##         Week Frequency
## 1 2013-12-30       549
## 2 2014-01-06       418

2) If you prefer to define a week as 7 days starting with DF$Date[1] and label them according to the first date in that week then use this. (Add 6 to Week if you prefer the last date in the week.)

weekno <- as.numeric(DF$Date - DF$Date[1]) %/% 7
Week <- DF$Date[1] + 7 * weekno
aggregate(Frequency ~ Week, DF, sum)
##         Week Frequency
## 1 2014-01-01       690
## 2 2014-01-08       277

3) or if you prefer to label it with the first date existing in DF in that week then use this. This and the last Week definition give the same result if there are no missing dates as is the case here. (If you want the last existing date in the week rather than the first then replace match with findInterval.)

weekno <- as.numeric(DF$Date - DF$Date[1]) %/% 7
Week <- DF$Date[match(weekno, weekno)]
aggregate(Frequency ~ Week, DF, sum)
##         Week Frequency
## 1 2014-01-01       690
## 2 2014-01-08       277

Note

The input in reproducible form is assumed to be:

Lines <- "Date Frequency
1 2014-01-01 179
2 2014-01-02 82 
3 2014-01-03 89 
4 2014-01-04 109 
5 2014-01-05 90 
6 2014-01-06 66 
7 2014-01-07 75 
8 2014-01-08 106 
9 2014-01-09 89 
10 2014-01-10 82"
DF <- read.table(text = Lines)
DF$Date <- as.Date(DF$Date)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
7

I would use library(lubridate).

df <- read.table(header = TRUE,text = "date Frequency
2014-01-01  179
2014-01-02  82  
2014-01-03  89  
2014-01-04  109 
2014-01-05  90  
2014-01-06  66  
2014-01-07  75  
2014-01-08  106 
2014-01-09  89  
2014-01-10  82")

You can use base R or library(dplyr):

base R: to be sure that the date is really a date:

df$date <- ymd(df$date)
df$week <- week(df$date)

or short:

df$week <- week(ymd(df$date))

or dplyr:

library(dplyr)
df %>% 
  mutate(week = week(ymd(date))) %>% 
  group_by(week)

Out:

enter image description here

ah bon
  • 9,293
  • 12
  • 65
  • 148
J_F
  • 9,956
  • 2
  • 31
  • 55
3

Barring a good reason not to, you should be sure to use ISO weeks to be sure your aggregation intervals are equally sized.

data.table makes this work like so:

library(data.table)
setDT(myDF) # convert to data.table

myDF[ , .(weekly_freq = sum(Frequency)), by = isoweek(Date)]
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
1

Maybe you can try the base R code with aggregate + format, i.e.,

dfout <- aggregate(Frequency ~ yearweek,within(df,yearweek <- format(Date,"%Y,%W")),sum)

such that

> dfout
  yearweek Frequency
1  2014,00       549
2  2014,01       418

DATA

df <- structure(list(Date = structure(c(16071, 16072, 16073, 16074, 
16075, 16076, 16077, 16078, 16079, 16080), class = "Date"), Frequency = c(179L, 
82L, 89L, 109L, 90L, 66L, 75L, 106L, 89L, 82L)), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
1

The new package slider from RStudio addresses this problem directly including the specification of the start of the weekly periods. Suppose the weekly periods were to start on a Monday so that the beginning of the first week would be Monday, 2013-12-30. Then the slider solution would be

library(slider)   
slide_period_dfr(.x = DF, .i=as.Date(DF$Date), 
                 .period = "week", 
                 .f = ~data.frame(week_ending = tail(.x$Date,1),
                                   week_freq = sum(.x$Frequency)),
                 .origin = as.Date("2013-12-30"))

with the result

 week_ending week_freq
1  2014-01-05       549
2  2014-01-10       418
WaltS
  • 5,410
  • 2
  • 18
  • 24