Group the dataframe by week and add a column that calculates the variance/mean of that week in R

Question

I am a beginner in R and in coding in general.. I have a dataframe that looks like this:

 Date         Week          Spend 
1   2019-07-14 2019-07-08   1.81
2   2019-07-13 2019-07-08   1.31
3   2019-07-12 2019-07-08   1.56
4   2019-07-11 2019-07-08   0.45
5   2019-07-10 2019-07-08   5.00

The whole data has several weeks. First, I will need to group the data by week and sum the values.

For now I tried this:

df$nweek = (rep(1:15, each= 7))

Results:

   Date       Week     Spend     nweek
1   2019-07-14 2019-07-08  1.81      1
2   2019-07-13 2019-07-08  1.31      1
3   2019-07-12 2019-07-08  1.56      1
4   2019-07-11 2019-07-08  0.45      1
5   2019-07-10 2019-07-08  5.00      1
6   2019-07-09 2019-07-08  3.59      1
7   2019-07-08 2019-07-08  4.08      1
8   2019-07-07 2019-07-01  2.83      2
9   2019-07-06 2019-07-01  1.38      2
10  2019-07-05 2019-07-01  1.59      2
11  2019-07-04 2019-07-01  0.93      2
12  2019-07-03 2019-07-01  1.50      2
13  2019-07-02 2019-07-01  3.22      2
14  2019-07-01 2019-07-01  6.20      2
15  2019-06-30 2019-06-24  5.47      3
16  2019-06-29 2019-06-24  1.77      3

so that in this way I can have an "id" of each week. However, for some reason I cannot group my dataframe by this sequence of number I just produced:

df = df %>% group_by(nweek) %>%
  summarise (Spend = sum(Spend))

Instead, the result only gives me one row and sums the value(Spend) of the whole dataframe. I tried as.character on the "nweek" column and didnt work

Second,

After grouping the dataframe by week, I'm trying to calculate the mean and standard deviation each week, and return those values to new columns in the data frame. How can I do this?

Thanks

Try with `%>% dplyr::summarise(Spend = sum(Spend))` Could be that you loaded `plyr` as well and `plyr::summarise` masked the `dplyr::summarise` — akrun, Jul 17 '19 at 15:39
Possible duplicate of [Averaging daily data into weekly data](https://stackoverflow.com/questions/15102327/averaging-daily-data-into-weekly-data) — camille, Jul 17 '19 at 15:41
ad (2) simply use `group_by(nweek) %>% mutate(mean=mean(Spend), sd=sd(Spend))` (if you want to insert the columns in the original data frame) or `group_by(nweek) %>% summarise(mean=mean(Spend), sd=sd(Spend))` if you want only the summary. — January, Jul 17 '19 at 15:42
Is it same as this one ? https://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group — Regressor, Jul 17 '19 at 15:43
@akrun you are right! I loaded plyr as well, which is why it is not working! — Sakura, Jul 17 '19 at 16:29

score 1 · Answer 1 · answered Jul 17 '19 at 16:23

I would make one slight change to Ryan John's great solution. You can use mutate() to modify the Date, Week, and week_num columns all in one pipe.

df <-  tibble::tribble(
  ~Date,       ~Week, ~Spend, ~nweek,
  "7/14/2019",  "7/8/2019",   1.81,      1,
  "7/13/2019",  "7/8/2019",   1.31,      1,
  "7/12/2019",  "7/8/2019",   1.56,      1,
  "7/11/2019",  "7/8/2019",   0.45,      1,
  "7/10/2019",  "7/8/2019",   5.95,      1,
  "7/9/2019",  "7/8/2019",   3.59,      1,
  "7/8/2019",  "7/8/2019",   4.08,      1,
  "7/7/2019",  "7/1/2019",   2.83,      2,
  "7/6/2019",  "7/1/2019",   1.38,      2,
  "7/5/2019",  "7/1/2019",   1.59,      2,
  "7/4/2019",  "7/1/2019",   0.93,      2,
  "7/3/2019",  "7/1/2019",    1.5,      2,
  "7/2/2019",  "7/1/2019",   3.22,      2,
  "7/1/2019",  "7/1/2019",    6.2,      2,
  "6/30/2019", "6/24/2019",   5.47,      3,
  "6/29/2019", "6/24/2019",   1.77,      3
)

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:lubridate':
#> 
#>     intersect, setdiff, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df %>% 
  mutate(Date = mdy(Date),
         Week = mdy(Week),
         week_num = week(Date)) %>% 
  group_by(week_num) %>% 
  summarise(spend_sum = sum(Spend),
            spend_sd = sd(Spend))
#> # A tibble: 3 x 3
#>   week_num spend_sum spend_sd
#>      <dbl>     <dbl>    <dbl>
#> 1       26      13.4     2.38
#> 2       27      15.5     1.16
#> 3       28      14.7     2.00

^{Created on 2019-07-17 by the reprex package (v0.2.1)}

score 0 · Accepted Answer · answered Jul 17 '19 at 15:47

Try this:

library(tibble)

df <-  tibble::tribble(
           ~Date,       ~Week, ~Spend, ~nweek,
     "7/14/2019",  "7/8/2019",   1.81,      1,
     "7/13/2019",  "7/8/2019",   1.31,      1,
     "7/12/2019",  "7/8/2019",   1.56,      1,
     "7/11/2019",  "7/8/2019",   0.45,      1,
     "7/10/2019",  "7/8/2019",   5.95,      1,
      "7/9/2019",  "7/8/2019",   3.59,      1,
      "7/8/2019",  "7/8/2019",   4.08,      1,
      "7/7/2019",  "7/1/2019",   2.83,      2,
      "7/6/2019",  "7/1/2019",   1.38,      2,
      "7/5/2019",  "7/1/2019",   1.59,      2,
      "7/4/2019",  "7/1/2019",   0.93,      2,
      "7/3/2019",  "7/1/2019",    1.5,      2,
      "7/2/2019",  "7/1/2019",   3.22,      2,
      "7/1/2019",  "7/1/2019",    6.2,      2,
     "6/30/2019", "6/24/2019",   5.47,      3,
     "6/29/2019", "6/24/2019",   1.77,      3
     )

library(lubridate)
df$Date <-  lubridate::mdy(df$Date)
df$Week <-  lubridate::mdy(df$Week)
df$week_num <- lubridate::week(df$Date)

library(dplyr)
df %>%  
  group_by(week_num) %>% 
  summarise(spend_sum = sum(Spend),
            spend_sd = sd(Spend))

Group the dataframe by week and add a column that calculates the variance/mean of that week in R

2 Answers2