2

I am having trouble converting daily data into weekly using averages over the week.

My Data looks like this:

> str(daily_FWIH)
'data.frame':   4371 obs. of  6 variables:
 $ Date     : Date, format: "2013-03-01" "2013-03-02" "2013-03-04" "2013-03-05" ...
 $ CST.OUC  : Factor w/ 6 levels "BVG11","BVG12",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ CST.NAME : Factor w/ 6 levels "Central Scotland",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ SOM_patch: Factor w/ 6 levels "BVG11_Highlands & Islands",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Row_Desc : Factor w/ 1 level "FSFluidWIH": 1 1 1 1 1 1 1 1 1 1 ...
 $ Value    : num  1.16 1.99 1.47 1.15 1.16 1.28 1.27 2.07 1.26 1.19 ...

> head(daily_FWIH)
        Date CST.OUC            CST.NAME                 SOM_patch   Row_Desc Value
1 2013-03-01   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.16
2 2013-03-02   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.99
3 2013-03-04   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.47
4 2013-03-05   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.15
5 2013-03-06   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.16
6 2013-03-07   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH  1.28

This is what I've tried converting this to xts object as shown here

This is what I have tried:

daily_FWIH$Date = as.Date(as.character(daily_FWIH$Date), "%d/%m/%Y")
library(xts)

temp.x = xts(daily_FWIH[-1], order.by=daily_FWIH$Date)
apply.weekly(temp.x, colMeans(temp.x$Value))

I have two problems. My week starts and ends on a "Saturday" and I get the following error:

> apply.weekly(temp.x, colMeans(temp.x$Value))
Error in colMeans(temp.x$Value) : 'x' must be numeric

UPDATE Based on Sam's comments:

This is what I did:

daily_FWIH$Date <- ymd(daily_FWIH$Date) # convert to POSIX format
daily_FWIH$fakeDate <- daily_FWIH$Date + days(2)
daily_FWIH$week <- week(daily_FWIH$fakeDate) # extract week value
daily_FWIH$year <- year(daily_FWIH$fakeDate)

    > daily_FWIH %>%
+ group_by(year,week) %>%
+ mutate(weeklyAvg = mean(Value), weekStartsOn = min(Date)) %>% # create the average variable
+ slice(which(Date == weekStartsOn)) %>% # select just the first record of the week - other vars will come from this
+ select(-Value,-fakeDate,-week,-year,-Date, -CST.OUC,-CST.NAME) # drop unneeded variables
Source: local data frame [631 x 6]
Groups: year, week

   year week                   SOM_patch   Row_Desc weeklyAvg weekStartsOn
1  2013    9   BVG11_Highlands & Islands FSFluidWIH  1.048333   2013-03-01
2  2013    9   BVG12_North East Scotland FSFluidWIH  1.048333   2013-03-01
3  2013    9      BVG13_Central Scotland FSFluidWIH  1.048333   2013-03-01
4  2013    9   BVG14_South East Scotland FSFluidWIH  1.048333   2013-03-01
5  2013    9 BVG15_West Central Scotland FSFluidWIH  1.048333   2013-03-01
6  2013    9   BVG16_South West Scotland FSFluidWIH  1.048333   2013-03-01
7  2013   10   BVG11_Highlands & Islands FSFluidWIH  1.520500   2013-03-02
8  2013   10   BVG12_North East Scotland FSFluidWIH  1.520500   2013-03-02
9  2013   10      BVG13_Central Scotland FSFluidWIH  1.520500   2013-03-02
10 2013   10   BVG14_South East Scotland FSFluidWIH  1.520500   2013-03-02
..  ...  ...                         ...        ...       ...          ...

Which is incorrect...

The desired output is:

> head(desired)
        Date BVG11.Highlands_I_.A_pct BVG12.North.East.ScotlandA_pct BVG13.Central.ScotlandA_pct
1 01/03/2013                     1.16                           1.13                        1.08
2 08/03/2013                     1.41                           2.37                        1.80
3 15/03/2013                     1.33                           3.31                        1.34
4 22/03/2013                     1.39                           2.49                        1.62
5 29/03/2013                     5.06                           3.42                        1.42
6                                  NA                             NA                          NA
  BVG14.South.East.ScotlandA_pct BVG15.West.Central.ScotlandA_pct BVG16.South.West.ScotlandA_pct
1                           1.05                             0.98                           0.89
2                           1.51                             1.21                           1.07
3                           1.13                             2.13                           2.01
4                           2.14                             1.24                           1.37
5                           1.62                             1.46                           1.95
6                             NA                               NA                             NA

> str(desired)
'data.frame':   11 obs. of  7 variables:
 $ Date                            : Factor w/ 6 levels "01/03/2013",..: 2 3 4 5 6 1 1 1 1 1 ...
 $ BVG11.Highlands_I_.A_pct        : num  1.16 1.41 1.33 1.39 5.06  ...
 $ BVG12.North.East.ScotlandA_pct  : num  1.13 2.37 3.31 2.49 3.42  ...
 $ BVG13.Central.ScotlandA_pct     : num  1.08 1.8 1.34 1.62 1.42  ...
 $ BVG14.South.East.ScotlandA_pct  : num  1.05 1.51 1.13 2.14 1.62  ...
 $ BVG15.West.Central.ScotlandA_pct: num  0.98 1.21 2.13 1.24 1.46 ...
 $ BVG16.South.West.ScotlandA_pct  : num  0.89 1.07 2.01 1.37 1.95 ...
Community
  • 1
  • 1
Shery
  • 1,808
  • 5
  • 27
  • 51
  • re: starting *and* ending on Saturday - do you mean Saturday through Sunday and then starting again the next Saturday? – Sam Firke Mar 13 '15 at 21:55

1 Answers1

2

Find the first Saturday in your data, then assign a week ID to all dates in your data set based on that :

library(lubridate) # for the wday() and ymd() functions
daily_FWIH$Date <- ymd(daily_FWIH$Date)
saturdays <- daily_FWIH[wday(daily_FWIH$Date) == 7, ] # filter for Saturdays
startDate <- min(saturdays$Date) # select first Saturday
daily_FWIH$week <- floor(as.numeric(difftime(daily_FWIH$Date, startDate, units = "weeks")))

Once you have a weekID-starting-on-Saturday variable, this is a standard R problem. You can calculate the weekly averages using your method of choice for calculating means within a subgroup. I like dplyr:

library(dplyr)
daily_FWIH %>%
  group_by(week, SOM_patch) %>% # use your grouping variables in addition to week
  summarise(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
  mutate(firstDayOfWeek = wday(weekBeginDate, label=TRUE)) # confirm correct week cuts

Source: local data frame [2 x 5]
Groups: week

  week                 SOM_patch weeklyAvg weekBeginDate firstDayOfWeek
1   -1 BVG11_Highlands & Islands      1.16    2013-03-01            Fri
2    0 BVG11_Highlands & Islands      1.41    2013-03-02            Sat

Update based on comments below:

If you want to see the other values in your dataset, you'll need to decide how to select or calculate weekly values when daily values within a week conflict. In your sample data, they are the same in all rows, so I'm just drawing them from the row containing the first day of the week.

library(dplyr)
daily_FWIH %>%
  group_by(week, SOM_patch) %>% # use your grouping variables
  mutate(weeklyAvg = mean(Value), weekBeginDate = min(Date)) %>%
  slice(which(Date == weekBeginDate)) %>% # select just the first record of the week - other vars will come from this 
  select(-Value, -Date) # drop unneeded variables

Source: local data frame [2 x 7]
Groups: week, SOM_patch

  CST.OUC            CST.NAME                 SOM_patch   Row_Desc week weeklyAvg weekBeginDate
1   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH   -1      1.16    2013-03-01
2   BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH    0      1.41    2013-03-02
Community
  • 1
  • 1
Sam Firke
  • 21,571
  • 9
  • 87
  • 105
  • Thank you @Sam Firke but I would like to see the date as well. e.g. my first observation will be like `2013-03-02 BVG11 Highlands & Islands BVG11_Highlands & Islands FSFluidWIH 1.99` This will have the same value and subsequent dates will be `2013-03-09, 2013-03-16` etc. These will have the average values between `04 to 2013-03-08` – Shery Mar 13 '15 at 19:34
  • @Shery does that work? It's now displaying the week's earliest date as well as its day value (if you don't need that, delete the last line of code) – Sam Firke Mar 13 '15 at 19:57
  • I dont know if the averages are right...for example: on `2013-03-01` the average should be 1.16 which is fine but if I calculate manually between `2013-03-01` and `2013-03-02` i.e. `1.16+1.99=1.575` which is not your answer also can't see to find other columns in my dataset i.e. `CST.OUC, CST.NAME, SOM_patch, Row_Desc` how to display those and I will accept it as an answer. Thanks again – Shery Mar 13 '15 at 21:27
  • See the updated answer. March 1 2013 was a Friday, so it's counted in a week by itself (since your weeks start on Saturday). Thus 1.16 is that weekly average. Then the next 5 records are in the next week (Sat, Mon, Tues, Wed, Thurs) so they are averaged together: `mean(c(1.99,1.47,1.15,1.16,1.28))` = 1.41. I don't see when you would average the values you name, from 2013-03-01 and 2013-03-02, since they are in different weeks. And re: selecting the other variables, it's easy for now since they are all the same - but is that always the case? Might help if you create a desired results table. – Sam Firke Mar 13 '15 at 21:49
  • Ah! Just got it. Thanks for the explanation. I wish to see previous weeks averages every Friday. For example `2013-03-01` the next observation should be on `2013-03-08` then `2013-03-15` and so on. Therefore, my week starts at Saturday including Sundays, Monday, Tuesday, Wednesday and Thursday and whatever is the average should be displayed every Friday. Sorry if I wasn't being clear enough. I wish to see weekEnding date i.e. first observation should be `2013-03-01` second should be `2013-03-08` then `2013-03-15` then `2013-03-22` and so on. Many thanks again for your help and efforts – Shery Mar 13 '15 at 22:44
  • Thank for posting the desired output. I see you want averages within groups of some region variable; that wasn't clear from your question or sample data, so having the desired result helps. That's an easy change: add the region variable to the arguments of the `group_by` variable in my first block of dplyr code. To display the week ending date instead of starting date, you'll change the variable `weekStartsOn = min(Date)` to 'weekEndsOn' = ... well, if `min(Date)` gives you the *lowest* date value in a week, what command do you think would give you the *highest* value in a week? – Sam Firke Mar 16 '15 at 01:42
  • 1
    I switched to a cleaner approach to grouping by weeks and cleaned up the answer, for future readers. The grouping into weeks starting on Saturday has been resolved; any followup questions on other aspects would be better off as their own questions at this point. I posted a link to another question about subgroup means, since it seems that's where you're stuck now. Next time I suggest posting a richer sample data set that has the different groups you want to group by and any other aspects of your desired results (you can't produce your desired result with the data you have provided). – Sam Firke Mar 16 '15 at 02:51
  • This is excellent! Thank you Sam... I have accepted it as an answer. All the best – Shery Mar 16 '15 at 14:42