0

I need to obtain the standard deviation for period surrounding over (-2,+2) days, instead of (-2,+2) observations. my table is in below

 date        value    standard_deviation
01/01/2015     18       ...
01/01/2015     15       ...
01/01/2015     5        ...
02/01/2015     66       ...
02/01/2015     7        ...
03/01/2015     7        ...
04/01/2015     19       ...
04/01/2015     7        ...
04/01/2015     11       ... 
04/01/2015     17       ...
05/01/2015     3        ...
06/01/2015     7        ...
  ...        ...       ...

The tricky part is that there are different number of observations in each days. Therefore, I CANNOT just using the following code

PROC EXPAND DATA=TESTTEST OUT=MOVINGAVERAGE;
CONVERT VAL=AVG / TRANSFORMOUT=(MOVSTD 5);
RUN;

Anyone can tell me how to include only (-2, +2) days of centered moving Standard Deviation in this case ?

Thanks in advance !

Best

Tristan Sun
  • 289
  • 2
  • 5
  • 16
  • possible duplicate of [Calculating moving average/stdev in SAS?](http://stackoverflow.com/questions/2460970/calculating-moving-average-stdev-in-sas) – DWal Jun 26 '15 at 17:48
  • Look at the linked answer. Summarise your data so it is by day, then calculate the moving average. Then join the moving averages back to your original data if you need it split out that way.Finally, you should at least mention what you have tried so far. – Robert Penridge Jun 26 '15 at 18:36
  • @RobertPenridge thanks for replying. I tried some normal moving average code, such as **CONVERT VAL=AVG / TRANSFORMOUT=(MOVSTD 5)** . But this code only applies if there is only one value in each day. In my case, there are different number of obs in each day. Thanks – Tristan Sun Jun 26 '15 at 19:48
  • @DWal Thanks for replying. I read through the example. However, in my case, there are different number of observations in each day. Therefore, I cannot just applying moving calculating. I need a another step here to include the changes of date. – Tristan Sun Jun 26 '15 at 19:55
  • PROC EXPAND can also make your data so that there is only one observation per day. Do that first. – Joe Jun 26 '15 at 20:06
  • Hi @Joe. I dont understand. There are multiple different values in each day. How am I suppose to leave only one value per day ? – Tristan Sun Jun 26 '15 at 20:18
  • As I said earlier, you should summarise your data so you only have a single row per day. You can do this many ways in sas such as via `proc sql`, `proc summary`, `proc means` or even via a datastep. Perhaps searching google for "summarising data in sas" will point you in the right direction. – Robert Penridge Jun 26 '15 at 21:55
  • @RobertPenridge How can you calculate standard deviation from summarized data? – DWal Jun 27 '15 at 03:18
  • @DWal Thanks !!!! I think your code hits the point !!! The only thing is that, there some gaps of dates (weekends are excluded), therefore, I cannot just using **date-2 and date+2**. I need to locate the previous two business days. Any idea ? – Tristan Sun Jun 27 '15 at 11:11
  • See my edited answer. – DWal Jun 27 '15 at 22:12
  • @Dwal Ah I missed the stddev part... was thinking only moving average... – Robert Penridge Jun 29 '15 at 14:25

1 Answers1

1

Sorry for the confusion. Here's how I'd do it using proc sql. Simply use a subquery with the std aggregate function and a where clause selecting the observations you want:

proc sql;
select
  h.date,
  h.value,
  ( select std(s.value)
    from have s
    where h.date between s.date-2 and s.date+2) as stddev
from
  have h
;
quit;

If the period should refer to only the days existing in the table (as opposed to all calendar days), then create a variable counting the dates and use that instead of the date variable to select your observations.

data have2;
  set have;
  by date;
  if first.date then sort+1;
run;

proc sql;
select
  h.date,
  h.value,
  ( select std(s.value)
    from have2 s
    where h.sort between s.sort-2 and s.sort+2) as stddev
from
  have2 h
;
quit;
DWal
  • 2,752
  • 10
  • 19