1

I want to run a MIDAS regression of x on y. x are weekly data and y are monthly. In order to do that the number of weeks should be divisible by the number of months or vice versa the weeks should be a multiple of the months.

The problem is that months vary in their number of weeks (4-5) so the aforementioned condition does not hold. Consequently, i want to transform the data so that each month contains 4 weeks.

I firstly thought to disaggregate the weekly data (starting Saturdays) to daily data and then define custom weeks i.e.

1st week: 1-8 day, 2nd week: 9-15, 3rd week: 16-23, 4th week: 24-30 or 31

I managed to disaggregate the weekly data into daily but I'm struggling to aggregate them back in the week month form. By aggregate I mean take average values.

#Original Data
        Date Hits wday
1 2012-03-24   81    7
2 2012-03-31   77    7
3 2012-04-07   80    7
4 2012-04-14   83    7
5 2012-04-21   83    7
6 2012-04-28   83    7

#Daily data
        ddate hhits wday mday month week
1  2012-03-24    81    7   24     3   12
2  2012-03-25    77    1   25     3   13
3  2012-03-26    77    2   26     3   13
4  2012-03-27    77    3   27     3   13
5  2012-03-28    77    4   28     3   13
6  2012-03-29    77    5   29     3   13
7  2012-03-30    77    6   30     3   13
8  2012-03-31    77    7   31     3   13
9  2012-04-01    80    1    1     4   14
10 2012-04-02    80    2    2     4   14
11 2012-04-03    80    3    3     4   14
12 2012-04-04    80    4    4     4   14
13 2012-04-05    80    5    5     4   14
14 2012-04-06    80    6    6     4   14
15 2012-04-07    80    7    7     4   14
16 2012-04-08    83    1    8     4   15
17 2012-04-09    83    2    9     4   15
18 2012-04-10    83    3   10     4   15
19 2012-04-11    83    4   11     4   15
20 2012-04-12    83    5   12     4   15
21 2012-04-13    83    6   13     4   15
22 2012-04-14    83    7   14     4   15
23 2012-04-15    83    1   15     4   16
24 2012-04-16    83    2   16     4   16
25 2012-04-17    83    3   17     4   16

#4week month 
     Date Hits
2012-03-4 77.5
2012-04-1   80
2012-04-2   83
...

Where the values are average of days as defined earlier and 2012-03-4 stands for the last custom week of march.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Sakis Geo
  • 11
  • 1
  • Can you clarify what you're trying to do? You just want averages of `hhits` over values of `week` in the daily data? – ulfelder Mar 25 '17 at 10:17
  • @ulfelder I want average of `hhits` over weeks but each month should contain strictly 4 weeks. – Sakis Geo Mar 25 '17 at 18:45
  • But 365/28 is 13.0357 etc. I.e., you cannot get 7-day weeks that nest neatly in months that nest neatly in years. This is a really annoying problem in time-series analysis, but it's there, and you can't math it away. – ulfelder Mar 26 '17 at 10:29

2 Answers2

0

If you've no missing values you can pretty much ignore the date column and do:

d = data.frame(val=rnorm(100))

crude_filler = function(i){
   x = rep(1:(nrow(d)/i), each=i)
   c(x, rep(max(x) + 1, each=nrow(d) - length(x)))
}

d$week = crude_filler(7)
d$month = crude_filler(28)

If you've missing values, create a new (complete) data frame first:

x = data.frame(Date=seq.Date(min(x), max(x), by=1))

x = merge(x, your_data)

If you're desperate to start on a certain day of each week, you'll need to adjust. You could also consider using a moving average (e.g. https://stackoverflow.com/a/4862334/2773500).

Community
  • 1
  • 1
MikeRSpencer
  • 1,276
  • 10
  • 24
0

Here is one solution using dplyr and which creates a grouping variable.

library(dplyr)

#create the function for grouping. 
get_groups <- function(x){
  l1 <- length(x) %/% 7
  grp <- c(rep(seq(l1), each = 7), rep(tail(seq(l1), 1), length(x) - 7 * l1))
    return(grp)
  }

#apply it using dplyr,
#First make sure your ddate is as.Date(df$ddate),

df %>% 
  group_by(v1 = format(ddate, '%Y-%m')) %>% 
  mutate(grp = get_groups(hhits)) %>% 
  group_by(v1, grp) %>% 
  summarise(avg = mean(hhits)) %>% 
  ungroup()

# A tibble: 3 × 3
#       v1   grp   avg
#    <chr> <int> <dbl>
#1 2012-03     1  77.5
#2 2012-04     1  80.0
#3 2012-04     2  83.0
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • First thanks for your effort. Second, since I'm a basic R user, I'm kind of lost on how this methods works so I just applied it to my workspace as a script. I get this error: `Error in UseMethod("group_by_") : no applicable method for 'group_by_' applied to an object of class "function" ` – Sakis Geo Mar 25 '17 at 18:37
  • Did you convert youb`ddate` to `as.Date(df$ddate, format = '%Y-%m-%d')` – Sotos Mar 25 '17 at 18:57
  • Sorry my bad I forgot to change the name of my dataframe. I did now but i am getting another error: `Error in mutate_impl(.data, dots) : incompatible size (1821), expecting 1 (the group size) or 1` My dataframe has 1821 rows btw. – Sakis Geo Mar 26 '17 at 08:10
  • Try and run `get_groups` by itself i.e. `get_groups(df$hhits)`. – Sotos Mar 26 '17 at 08:46