0

Suppose that we have DT_vec in MATLAB R2015b:

DT_vec = 
2008    7   21  0   0   0
2008    7   22  0   0   0
2008    7   23  0   0   0
2008    7   29  0   0   0
2008    8   4   0   0   0
2008    8   12  0   0   0
2008    9   14  0   0   0
2008    9   15  0   0   0
2008    9   16  0   0   0
2008    9   17  0   0   0
2008    9   20  0   0   0
2008    9   21  0   0   0

Data stored in My_data:

32
43
12
43
2
12
54
32
34
5
32
12

I want calculate weekly (or monthly) average of this time series (My_data). How can I do that?

Eghbal
  • 3,892
  • 13
  • 51
  • 112
  • 2
    average over which data? according to my understanding, the first 3 left columns represents the date, but what are the first 3 right column stands for - do you want to calculate the average over these values? – ibezito Mar 14 '16 at 10:21
  • @drorco Thank you for your comment, I revised the question. First 3 right column stands from time(hour,etc.) which we don't have this data in above case. I want calculate average over `My_data` values. – Eghbal Mar 14 '16 at 10:23
  • 1
    For a question on how to do this efficiently, see [this](http://stackoverflow.com/questions/33635984/time-series-aggregation-efficiency) question. – David Kelley Mar 14 '16 at 14:12

1 Answers1

4

Monthly is easier, you can use the first and second columns from DT_vec to create the subs array needed for accumarray

[~,~,subs] = unique(DT_vec(:,1:2), 'rows')
MonthlyMeans = accumarray(subs, My_data, [], @mean)

You can do the same thing for weekly, but you'll have to work out how to get subs. It shouldn't be much more complicated but it's probably easier to use a date serial number (i.e. datenum(DT_vec)) and then do something like subs = floor((datenum(DT_vec) + offset)/7) where offset is an integer between 0 and 6 that you use to get the numbers to start on Sunday (or Monday). You might then want to subtract to make the first week start at 1 so subs = subs - min(subs) + 1

Edit: How does this work and what is offset?

The idea here is to represent your dates as serial numbers (by using the datenum function). From the docs for datenum:

The datenum function creates a numeric array that represents each point in time as the number of days from January 0, 0000.

So each whole number in the result from datenum represents a whole day. Running this on your example DT_vec (i.e. datenum(DT_vec)) we get:

DT_num =

     733610
     733611
     733612
     733618
     733624
     733632
     733665
     733666
     733667
     733668
     733671
     733672

Your first two dates were consecutive and as a result DT_num(2) - DT_num(1) == 1. So how to we find the weeks? We need to group these serial date numbers in clusters of 7. To do that I divide each number by 7 and then drop the decimal points using floor. To demonstrate this using smaller numbers, what happens when we divide 0:14 by 7? We get

     0
0.1429
0.2857
0.4286
0.5714
0.7143
0.8571
1.0000
1.1429
1.2857
1.4286
1.5714
1.7143
1.8571
2.0000

And if we drop the decimals using floor we get:

 0
 0
 0
 0
 0
 0
 0
 1
 1
 1
 1
 1
 1
 1
 2

The same will happen with your actual numbers, they just won't start at 0. So what is the offset for? Well since each change in whole number after dividing by 7 represents a new week, we need to make sure we are dividing weeks starting at Sunday rather than some arbitrary day like Thursday. Since MATLAB starts counting from the first of January year 0000, we need to know what day of the week that was and then use offset to make it as if it started counting from the next Sunday instead. What day was 1 Jan 0000? We can use MATLAB's weekday function to find out. For example what day is it today?

weekday(datenum([2016 03 14]))

returns 2 meaning Monday. So what day was day 0? weekday(0) returns 6 meaning Friday. So we need to offset this counter to make it start from 0 and thus offset should equal weekday(1) - 1.

To further illustrate this point, consider the past 14 days:

d = (today-7:today)'

Now

[floor(d/7) weekday(d)]

returns

ans =

  105198           2
  105198           3
  105198           4
  105198           5
  105199           6
  105199           7
  105199           1
  105199           2
  105199           3
  105199           4
  105199           5
  105200           6
  105200           7
  105200           1
  105200           2

A new week starts when the second column goes from 7 to 1 but you can see that the grouping created by floor(d/7) in column 1 doesn't correspond to this. In fact it is off by 5 (or -2). That's what we want to use offset for. So if we set offset = weekday(0) - 1 then

[floor((d+offset)/7) weekday(d)]

gives

  105199           2
  105199           3
  105199           4
  105199           5
  105199           6
  105199           7
  105200           1
  105200           2
  105200           3
  105200           4
  105200           5
  105200           6
  105200           7
  105201           1
  105201           2

which correctly partitions the weeks.

Dan
  • 45,079
  • 17
  • 88
  • 157
  • Weekly is much harder...I had to create a new column of week number using `weeknum`. – GameOfThrows Mar 14 '16 at 10:29
  • @GameOfThrows I just added a description of how I would do it using `datenum`. I didn't know about `weeknum` but to me that makes it basically just as easy as monthly because `[~,~,subs] = unique([DT_vec(:,1),weeknum(DT_vec(:,2))], 'rows')` although that's going to be messy for the first and last week of each year so I would stick with my `datenum` method – Dan Mar 14 '16 at 10:33
  • Yes, `weeknum` is just a date parser that takes the date vector and assigns it with the week number of the year (which I think is just like what you have done with `datenum`) – GameOfThrows Mar 14 '16 at 10:40
  • @GameOfThrows the difference is that my `datenum` code will assign the same number to the 31st of dec and the 1st of Jan (unless one 1 Jan was Sunday) where as `weeknum` should assign `52` to the end of dec and `1` to the beginning of Jan thus splitting a week that shouldn't be split – Dan Mar 14 '16 at 10:59
  • That's true, it reset at the end of the year. Your method is more accurate. – GameOfThrows Mar 14 '16 at 11:06
  • @Dan. Thank you for your detailed answer. Can you describe more about `offset`? For example what is meaning of `4` for this variable? – Eghbal Mar 14 '16 at 11:54
  • @user2991243 Added a pretty detailed explanation now – Dan Mar 14 '16 at 13:30