1

New to R so forgive me if terminology is off.

I have a dataframe

      date           val1   val2 val3         val4
1  2016-01-01     8007.59 128739 1573            0
2  2016-01-02     8526.98 142289 1798            0
3  2016-01-03     7720.77 132418 1433            0
4  2016-01-04     6845.67 123710 1280            0
5  2016-01-05     7176.20 126395 1302            0
6  2016-01-06     6125.98 117223 1148            2
7  2016-01-07     6125.16 109752 1119           30
8  2016-01-08     6869.92 107377 1233           24
9  2016-01-09     7289.16 107644 1326           25
10 2016-01-10     7360.92 124131 1330           21
11 2016-01-11     6697.14 112992 1185           26
12 2016-01-12     6418.59 106102 1116           22
13 2016-01-13     7334.01 118562 1156           21
14 2016-01-14     7845.45 113140 1184           17
15 2016-01-15     7902.26 104892 1207           37
16 2016-01-16     8443.98 114435 1336           37
17 2016-01-17     9010.53 129167 1370           29
18 2016-01-18     9750.08 125191 1467           29
19 2016-01-19     6864.10 101307 1085           11
20 2016-01-20     7519.02  89794 1095           21
21 2016-01-21     8208.62  82585 1039           15
22 2016-01-22     7839.53  78314 1000           26
23 2016-01-23     8104.59  79346 1089           32
24 2016-01-24     9133.29  80510 1135           33
25 2016-01-25     9763.78 103603 1217           21

I would like to sum all the values for each week. The data spans multiple years so to be clear I don't want to aggregate week numbers across years (eg NOT all week1s all week2s ... week52s) but rather just sum each individual week-year.

In python/pandas this would be df.groupby(pd.Grouper(key='date', freq='w')).sum()

thanks!

RSHAP
  • 2,337
  • 3
  • 28
  • 39
  • 5
    `dateweek <- format(x$date, "%Y-%V")` will give you `"2018-15"` (for now), where 15 is the week number as defined by ISO 8601. You can use `%U` instead, slightly different convention. With this, you can group using whatever tool you prefer (`dplyr::group_by`, `data.table`, `aggregate`/`by`, ...). – r2evans Apr 10 '18 at 15:12
  • Some caution required. See number 24 in [more-falsehoods-programmers-believe-about-time](http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time) – dww Apr 10 '18 at 15:25
  • @dww, is that caution to my `"%V"` suggestion or for dealing with dates/times in general? – r2evans Apr 10 '18 at 16:05
  • @r2evans just in general. If ISO 8601 is the definition you want, then your suggestion is the right one. Just saying that different people in different places have different definitions of what a week is, and we should not be too quick to assume that the ISO or any other definition is always the correct one. Weeks beginning on Sunday are pretty common place, for example. – dww Apr 10 '18 at 19:36

1 Answers1

5

To group by the ISO definition of weeks, use

require(tidyverse)
df %>% 
  group_by(year = year(date), week = week(date)) %>% 
  summarise_if(is.numeric, sum)

To group by weeks starting on Sunday, use @r2evans suggestion

require(tidyverse)
df %>% 
  group_by(week = format(date, '%Y-%U'))%>% 
  summarise_if(is.numeric, sum)
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38