3

I have a time-indexed matrix (xts object) and I want only the fourth Wednesday of every November.

require(quantmod)
getSymbols("^GSPC", from="1900-01-01")   #returns GSPC
GSPC$WED  <- weekdays(time(GSPC)) == "Wednesday"
GSPC$NOV  <- months(time(GSPC)) == "November"
G         <- GSPC[GSPC$WED==1 & GSPC$NOV==1]

That's as far as I got in R. To solve my problem I punted up to bash.

write.zoo(G, "wen_in_nov")

I did the following hack:

cat wen_in_nov |
grep -v IND |
cut -c 1-10 | 
sed 's/-/ /g' | 
awk '{if($3 >= 22 && $3 < 29) print $1, $2, $3, "winner"}' |
sed 's/ /-/g' > fourth_wen

The fourth_wen file needs to separate the - from the string 'winner' so I just did that in vi. Importing into back to R:

fourth_wen <- read.zoo("fourth_wen", format="%Y-%m-%d")

And that essentially is the fourth Wednesday in November since 1950. Is there a way to do it all in R with less code?

oguz ismail
  • 1
  • 16
  • 47
  • 69
Milktrader
  • 9,278
  • 12
  • 51
  • 69

4 Answers4

5

Use .indexmon etc to access the POSIXlt values directly

GSPC[.indexmon(GSPC)==10 & .indexmday(GSPC) > 22 & .indexmday(GSPC) < 29
       &.indexwday(GSPC) == 3]

           GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
2007-11-28   1432.95   1471.62  1432.95    1469.02  4508020000       1469.02
2008-11-26    852.90    887.68   841.37     887.68  5793260000        887.68
2009-11-25   1106.49   1111.18  1104.75    1110.63  3036350000       1110.63
2010-11-24   1183.70   1198.62  1183.70    1198.35  3384250000       1198.35
2011-11-23   1187.48   1187.48  1161.79    1161.79  3798940000       1161.79
Jeff R
  • 829
  • 4
  • 5
  • Nice. Keeps me from making errors by going to a bashscript and back. – Milktrader Nov 24 '11 at 09:37
  • End-users will have to remember that it uses the same error-prone POSIX convention which numbers months from 0 to 11. Not very friendly. – Dirk Eddelbuettel Nov 24 '11 at 13:49
  • Your expression missed the years 1961, 1967, 1972, 1978, 1989, 1995 and 2006; see my edited answer. – Dirk Eddelbuettel Nov 24 '11 at 15:17
  • Also missed the year 2000; the error boils down to a '>' where a '>=' was needed (when comparing to day 22). – Dirk Eddelbuettel Nov 24 '11 at 15:26
  • It's easy to miss the >= or to get the numbers '22' or '29' OBO. You can catch this error by running an nrow() over the subset data and have it return a value you'd expect. – Milktrader Nov 24 '11 at 20:11
  • Or use a proper date-calculation function from a peer-reviewed high-quality library, rather than an error-prone hack. – Dirk Eddelbuettel Nov 25 '11 at 13:50
  • 1
    You make a good point about error-prone. The month starts at `0` but `indexmday` doesn't. I chose this as the answer because 1) it uses familiar syntax (not the best reason, I know I need to stretch a bit) and 2) you can use it to subset out Black Friday, which is the day after the 4th Thursday of November. Black Friday is not always the 4th Friday in November. I'm definitely going to explore Rcpp more soon. I need a good screencast course on C++ and Boost. – Milktrader Nov 25 '11 at 16:05
  • Black Friday is always 'Thanksgiving + 1', and hence always the day after the fourth Thursday in November. As for the screencast idea: yes, Boost and C++ are far from trivial but my answer here showed use of an R package deploying them---and you do not need to know C++ to use this R package! – Dirk Eddelbuettel Nov 30 '11 at 15:32
3

My RcppBDT package has a function for this.

RcppBDT wraps parts of the Boost Date_Time library, and Boost Date_Time happens to have a number of functions like that. So here is a quick loop for years 2008 to 2011, getting each year's fourth Wednesday in November:

R> library(RcppBDT)
Loading required package: Rcpp
Creating a generic function for ‘print’ from package ‘base’ in package ‘RcppBDT’
Creating a generic function for ‘format’ from package ‘base’ in package ‘RcppBDT’
R> for (y in 2008:2011) print(getNthDayOfWeek(fourth, Wed, Nov, y))
[1] "2008-11-26"
[1] "2009-11-25"
[1] "2010-11-24"
[1] "2011-11-23"
R> 

Here fourth, Wed and Nov are constants in the package namespace, modeled after corresponding enum types in the underlying C++ library. Makes for pretty easy use.

Edit: Here is a complete example for all 4th-Wed-in-Nov since 2000. I ensure that both the GSPC and the vector Wed of Wednesdays agree on the same Date type. Then it is just a matter of sticking Wed into GSPC:

R> library(quantmod)
Loading required package: Defaults
Loading required package: TTR
R> getSymbols("^GSPC", from="1900-01-01")  
R> Wed <- sapply(2000:2011, function(y) getNthDayOfWeek(fourth, Wed, Nov, y))
R> index(GSPC) <- as.Date(index(GSPC))
R> GSPC[as.Date(Wed)]
           GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
2000-11-22   1347.35   1347.35  1321.89    1322.36   963200000       1322.36
2001-11-28   1149.50   1149.50  1128.29    1128.52  1423700000       1128.52
2002-11-27    913.31    940.41   913.31     938.87  1350300000        938.87
2003-11-26   1053.89   1058.45  1048.28    1058.45  1097700000       1058.45
2004-11-24   1176.94   1182.46  1176.94    1181.76  1149600000       1181.76
2005-11-23   1261.23   1270.64  1259.51    1265.61  1985400000       1265.61
2006-11-22   1402.69   1407.89  1402.26    1406.09  2237710000       1406.09
2007-11-28   1432.95   1471.62  1432.95    1469.02  4508020000       1469.02
2008-11-26    852.90    887.68   841.37     887.68  5793260000        887.68
2009-11-25   1106.49   1111.18  1104.75    1110.63  3036350000       1110.63
2010-11-24   1183.70   1198.62  1183.70    1198.35  3384250000       1198.35
2011-11-23   1187.48   1187.48  1161.79    1161.79  3798940000       1161.79
R> 

Edit 2 As a Public Servive Announcement, here is where Jeff's answer fails:

R> ind <- .indexmon(GSPC)==10 & .indexmday(GSPC) > 22 & .indexmday(GSPC) < 29 
+         & .indexwday(GSPC) == 3
R> index(GSPC)[ind]
 [1] "1951-11-28" "1952-11-26" "1953-11-25" "1954-11-24" "1955-11-23"
 [6] "1956-11-28" "1957-11-27" "1958-11-26" "1959-11-25" "1960-11-23"
[11] "1962-11-28" "1963-11-27" "1964-11-25" "1965-11-24" "1966-11-23"
[16] "1968-11-27" "1969-11-26" "1970-11-25" "1971-11-24" "1973-11-28"
[21] "1974-11-27" "1975-11-26" "1976-11-24" "1977-11-23" "1979-11-28"
[26] "1980-11-26" "1981-11-25" "1982-11-24" "1983-11-23" "1984-11-28"
[31] "1985-11-27" "1986-11-26" "1987-11-25" "1988-11-23" "1990-11-28"
[36] "1991-11-27" "1992-11-25" "1993-11-24" "1994-11-23" "1996-11-27"
[41] "1997-11-26" "1998-11-25" "1999-11-24" "2001-11-28" "2002-11-27"
[46] "2003-11-26" "2004-11-24" "2005-11-23" "2007-11-28" "2008-11-26"
[51] "2009-11-25" "2010-11-24" "2011-11-23"

and

R> S <- 1951:2011
R> S[!S %in% as.numeric(format(index(GSPC)[ind], "%Y")) ]
[1] 1961 1967 1972 1978 1989 1995 2000 2006
R> 

So there are eight years missing in the sample of sixty when you use his approach.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • awesome! so does this package do natural language processing of dates? – Ramnath Nov 23 '11 at 18:56
  • No, I already mentioned that 'first', 'second', ... are just constants for 1, 2, 3. Dito for days of the week, and months. It is merely easier to type "fourth, Wed, Nov" than "4, 3, 11" but that is ultimately what gets passed to the C++ code. But even in C++ you'd all the API function with "fourth, Wed, Nov" as these are enums in the appropriate Boost namespaces. It's *just* a very clever trick, no NLP needed ;-) – Dirk Eddelbuettel Nov 23 '11 at 19:00
  • I'm re-installing Boost to give this a try. (thought I already had it installed) – Milktrader Nov 23 '11 at 21:00
  • Remind me again, you are on OS X? It should work quite easily on any Linux, and on Windows you get pre-built RcppBDT binaries. For OS X, you may have tweak include paths. You can always ask on Rcpp-devel. – Dirk Eddelbuettel Nov 23 '11 at 21:19
  • I got your example to work. How would you call this function on an xts object such as GSPC? Or would you create a separate object and merge it with something like merge(GSPC, Rccpobject, all=FALSE)? – Milktrader Nov 23 '11 at 22:46
  • Good that you have it working. *Per se*, this has nothing to do with `xts` -- it just creates `Date` objects. You can use those to index other things though... – Dirk Eddelbuettel Nov 23 '11 at 23:00
  • No offense to Dirk of course, but this isn't an answer to the question. See the rest of the answers here (including mine) for how to get the results you want. – Jeff R Nov 24 '11 at 07:58
  • Really? The question was for '4th Wed in Nov', and pointing to *a function returning Nth day-of-week in given month and year* is not an answer? How come? – Dirk Eddelbuettel Nov 24 '11 at 13:45
2

Here is one way

nov_dates <- expand.grid(1:30, 11, 1900:2011) 
nov_dates <- apply(nov_dates, 1, paste, collapse = "-")
nov_dates <- dmy(nov_dates)
nov_wed   <- nov_dates[wday(nov_dates, label = TRUE) == 'Wed']
nov_4wed  <- nov_wed[seq_along(nov_wed) %% 4 == 0]

EDIT. a small bug remains. if a november has 5 wednesdays, this code does not work. a minor correction will take care of it and i will post it soon. Here is a solution that works correctly

library(plyr)
library(lubridate)
nov_dates <- expand.grid(day = 1:30, month = 11, year = 1900:2011) 
nov_dates <- transform(nov_dates, 
   date = dmy(paste(day, month, year, sep = "-"))

nov_4_wed <- ddply(nov_dates, .(year), summarize, date[wday(date) == 4][4])
Ramnath
  • 54,439
  • 16
  • 125
  • 152
1

I don't spend a lot of time with time series objects, so there may be a better function than using substr to extract the day (in fact, I'm sure there is, I just don't know it off the top of my head). But this seems to work:

rs <- subset(GSPC,weekdays(time(GSPC)) == "Wednesday" & 
                    months(time(GSPC)) == "November" & 
                    as.numeric(substr(time(GSPC),9,10)) >= 22 & 
                    as.numeric(substr(time(GSPC),9,10)) < 29)
joran
  • 169,992
  • 32
  • 429
  • 468