2

I would like to download daily data from yahoo for the S&P 500, the DJIA, and 30-year T-Bonds, map the data to the proper time zone, and merge them with my own data. I have several questions.

  1. My first problem is getting the tickers right. From yahoo's website, it looks like the tickers are: ^GSPC, ^DJI, and ^TYX. However, ^DJI fails. Any idea why?

  2. My second problem is that I would like to constrain the time zone to GMT (I would like to ensure that all my data is on the same clock, GMT seems like a neutral choice), but I couldn' get it to work.

  3. My third problem is that I would like to merge the yahoo data with my own data, obtained by other means and available in a different format. It is also daily data.

Here is my attempt at constraining the data to the GMT time zone. Executed at the top of my R script.

Sys.setenv(TZ = "GMT")
# > Sys.getenv("TZ")
# [1] "GMT"
# the TZ variable is properly set
# but does not affect the time zone in zoo objects, why?

Here is my code to get the yahoo data:

library("tseries")
library("xts")

date.start <- "1999-12-31"
date.end <- "2013-01-01"

# tickers <- c("GSPC","TYX","DJI")
# DJI Fails, why?
# http://finance.yahoo.com/q?s=%5EDJI
tickers <- c("GSPC","TYX") # proceed without DJI

z <- zoo()
index(z) <- as.Date(format(time(z)),tz="")

for ( i in 1:length(tickers) ) 
  { 
     cat("Downloading ", i, " out of ", length(tickers) , "\n")
     x <- try(get.hist.quote(
         instrument = paste0("^",tickers[i])
         , start = date.start
         , end = date.end
         , quote = "AdjClose"
         , provider = "yahoo"
         , origin = "1970-01-01"
         , compression = "d"
         , retclass = "zoo" 
         , quiet = FALSE )
       , silent = FALSE )
     print(x[1:4]) # check that it's not empty
     colnames(x) <- tickers[i]
     z <- try( merge(z,x), silent = TRUE )
}

Here is the dput(head(df)) of my dataset:

df <- structure(list(A = c(-0.011489000171423, -0.00020300000323914, 
0.0430639982223511, 0.0201549995690584, 0.0372899994254112, -0.0183669999241829
), B = c(0.00110999995376915, -0.000153000000864267, 0.0497750006616116, 
0.0337960012257099, 0.014121999964118, 0.0127800004556775), date = c(9861, 
9862, 9863, 9866, 9867, 9868)), .Names = c("A", "B", "date"
), row.names = c("0001-01-01", "0002-01-01", "0003-01-01", "0004-01-01", 
"0005-01-01", "0006-01-01"), class = "data.frame")

I'd like to merge the data in df with the data in z. I can't seem to get it to work.

I am new to R and very much open to your advice about efficiency, best practice, etc.. Thanks.

EDIT: SOLUTIONS

  1. On the first problem: following GSee's suggestions, the Dow Jones Industrial Average data may be downloaded with the quantmod package: thus, instead of the "^DJI" ticker, which is no longer available from yahoo, use the "DJIA" ticker. Note that there is no caret in the "DJIA" ticker.

  2. On the second problem, Joshua Ulrich points out in the comments that "Dates don't have timezones because days don't have a time component."

  3. On the third problem: The data frame appears to have corrupted dates, as pointed out by agstudy in the comments.

My solutions rely on the quantmod package and the attached zoo/xts packages:

library(quantmod)

Here is the code I have used to get proper dates from my csv file:

toDate <- function(x){ as.Date(as.character(x), format("%Y%m%d")) }
dtz <- read.zoo("myData.csv"
  , header = TRUE
  , sep = ","
  , FUN = toDate
)
dtx <- as.xts(dtz)

The dates in the csv file were stored in a single column in the format "19861231". The key to getting correct dates was to wrap the date in "as.character()". Part of this code was inspired from R - Stock market data from csv to xts. I also found the zoo/xts manuals helpful.

I then extract the date range from this dataset:

date.start <- start(dtx)
date.end <- end(dtx)

I will use those dates with quantmod's getSymbols function so that the other data I download will cover the same period.

Here is the code I have used to get all three tickers.

tickers <- c("^GSPC","^TYX","DJIA")
data <- new.env() # the data environment will store the data
do.call(cbind, lapply( tickers
    , getSymbols
    , from = date.start
    , to = date.end
    , env = data # data saved inside an environment
    )
  )
ls(data)  # see what's inside the data environment
data$GSPC  # access a particular ticker

Also note, as GSee pointed out in the comments, that the option auto.assign=FALSE cannot be used in conjunction with the option env=data (otherwise the download fails).

A big thank you for your help.

Rubén
  • 34,714
  • 9
  • 70
  • 166
PatrickT
  • 10,037
  • 9
  • 76
  • 111
  • I think the problem with ^DJX because there isn't historical data for it. You can check this in the page of the provider , http://finance.yahoo.com/q/hp?s=%5ETYX+Historical+Prices – agstudy Mar 26 '13 at 16:23
  • your df has corrputed dates. So you can't merge it with your tickers. – agstudy Mar 26 '13 at 16:36
  • Dates don't have timezones because days don't have a time component. – Joshua Ulrich Mar 26 '13 at 16:48
  • Thanks! @agstudy: I checked on the yahoo website, the DJI ticker goes back to 1992. See: http://finance.yahoo.com/q/hp?s=%5EDJI+Historical+Prices – PatrickT Mar 26 '13 at 19:39
  • @agstudy: corrupted dates, you're right, no wonder I was struggling. The data comes from a Stata dataset, so obviously I did something wrong during the conversion. – PatrickT Mar 26 '13 at 19:46
  • @PatrickT, there's no "download to spreadsheet" link like there is for other stocks. There are a few discussions of this on SO (as well as elsewhere on the web), e.g. http://stackoverflow.com/a/3681992/967840 – GSee Mar 26 '13 at 19:48
  • You could try `getSymbols("DJIA", src="FRED", auto.assign=FALSE)` – GSee Mar 26 '13 at 19:50
  • @ Joshua Ulrich, "Dates don't have timezones because days don't have a time component" Why of course that makes sense! For some reason I was under the impression that there was a time stamp and that if the time zone was wrong the data could be ascribed to the wrong date (off by plus or minus one day). However, my data (converted from Stata as described in Edit1 above) does display a time zone information, which confused me, namely: ..., tzone = "UTC", tclass = "Date"), class = c("xts", "zoo"), .indexCLASS = "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC"). – PatrickT Mar 26 '13 at 19:50
  • Oh thanks GSee, I hadn't paid attention to that. And yes, an alternative data source is a great alternative. – PatrickT Mar 26 '13 at 19:52
  • @PatrickT, please don't edit answers into questions. If you have an answer, post it as an answer. – GSee Mar 28 '13 at 22:49
  • @GSee: "please don't edit answers into questions. If you have an answer, post it as an answer." Well, my edit is mostly an edited version of your answer with information gleaned from the comments by you and others, it was mostly a note for myself. I've seen it done before and I thought it was a good practice. It is also said that the comments are not for debating, so I'll keep it short. I'm very grateful for your help. – PatrickT Mar 30 '13 at 20:55
  • @Gsee. You write "the following statement is completely false: "Note also that the quantmod package takes care of locating the data source automatically, so src="FRED" need not be specified." " I have removed this statement, although I don't understand why it's completely false. I merely meant to say that it is not necessary to spell out the source in src="whatever". Certainly I didn't and the data was downloaded. Was it not from FRED? I construed one of your comments as stating that, apologies if I misunderstood. Perhasp it's another package that figures out the source? Anyway, it's removed. – PatrickT Mar 30 '13 at 20:58
  • @PatrickT, if you do not specify a value for "src", then "yahoo" is used by default. You can change the defaults with `setSymbolLookup` or `setDefaults`. (I already deleted the comment you referenced when I saw that you'd fixed your post). – GSee Mar 30 '13 at 20:59
  • @PatrickT, it looks like yahoo does have data for the ticker `DJIA`. I didn't realize that. – GSee Mar 30 '13 at 21:03
  • Thanks GSee. Oh I misunderstood! So yahoo does provide the DJIA data, just not from the tseries / get.hist.quote() function... – PatrickT Mar 30 '13 at 21:04
  • @PatrickT `tseries::get.hist.quote("DJIA")` is returning the same data just fine for me. – GSee Mar 30 '13 at 21:05
  • "tseries::get.hist.quote("DJIA")" Intriguing! so the bottom-line is that the yahoo ticker has changed from "^DJI" to "DJIA". I had read someone quoting a yahoo employee that there were legal reasons why the ^DJI ticker was unavailable (and unavailable for direct download from the website). Perhaps the DJIA ticker is subtly different from ^DJI (e.g. longer delay). Or perhaps yahoo have changed their policy again. Or some other reason... – PatrickT Mar 30 '13 at 21:10
  • Reference to yahoo's policy (your comment there): http://stackoverflow.com/questions/3679870/yahoo-finance-csv-file-will-not-return-dow-jones-dji?lq=1 – PatrickT Mar 30 '13 at 21:12

1 Answers1

5
  1. Yahoo doesn't provide historical data for ^DJI. Currently, it looks like you can get the same data by using the ticker "DJIA", but your mileage may vary.
  2. It does work in this case because you're only dealing with Dates
  3. the df object your provided is yearly data beginning in the year 0001. So, that's probably not what you wanted.

Here's how I would fetch and merge those series (or use an environment and only make one call to getSymbols)

library(quantmod)
do.call(cbind, lapply(c("^GSPC", "^TYX"), getSymbols, auto.assign=FALSE))
Community
  • 1
  • 1
GSee
  • 48,880
  • 13
  • 125
  • 145
  • Here you don't mean by one call that it is a vectorized function? – agstudy Mar 26 '13 at 16:50
  • getSymbols is "vectorized" in the sense that it has a for loop inside it. Is that what you mean – GSee Mar 26 '13 at 16:51
  • I mean that here you call `getSymobls` twice , once for each symbols. I would add some explanations since the OP is a newbie. For example, why it is better to use `lapply` here ( avoid the side effect of for) ... the merge is just a `cbind`,... – agstudy Mar 26 '13 at 16:56
  • Thanks Gsee and agstudy. I'm not bound to any particular package and function, so I'll read up on quantmod. Here are questions that spring to mind: is the data from yahoo? (it doesn't state the source in the line of code you wrote). Your suggestion of creating an environment, how does that work? in particular, how do you access data after it's saved inside an environment? Thanks! – PatrickT Mar 26 '13 at 20:01
  • Yes, yahoo is the default. Start at [quantmod.com](http://quantmod.com). Also read `?getSymbols`. e.g. `myenv <- new.env(); getSymbols("SPY", src="yahoo", env=myenv); get("SPY", pos=myenv)` or `myenv$SPY`. Seealso `?environment`. You might also be interested in the [r-sig-finance list](https://stat.ethz.ch/mailman/listinfo/r-sig-finance) – GSee Mar 26 '13 at 20:25
  • Thanks a lot, I've read up on it now: "Current src methods available are: yahoo, google, MySQL, FRED, csv, RData, and Oanda. Data is loaded silently without user assignment by default." – PatrickT Mar 26 '13 at 20:26
  • @GSee, thanks for this intro on environments. I tried to use them before, but couldn't work out how to find out what was inside. For instance, myenv$SPY assumes you know that your myenv contains SPY. Anyway, that's off topic. Thanks again. – PatrickT Mar 26 '13 at 20:29
  • @GSee. Thanks! I'm getting an empty environment. Sorry to trouble you again. Would you look at my second edit above? Thanks. – PatrickT Mar 26 '13 at 22:04
  • @PatrickT, you used `auto.assign=FALSE` which causes it to ignore the environment and return the data. If you have further questions, ask as new questions. Might want to spend some more time with the docs, examples, mailing list, SO search, etc. first... – GSee Mar 26 '13 at 22:24
  • @PatrickT, have a look [here](http://stackoverflow.com/questions/5574595/getsymbols-and-using-lapply-cl-and-merge-to-extract-close-prices/5574836#5574836), [here](http://stackoverflow.com/questions/15541873/how-can-i-download-a-set-of-prices-with-getsymbols-and-store-them-in-the-order-i), or [here](http://stackoverflow.com/questions/11179154/r-names-of-quantmod-variables/11179369#11179369) for example – GSee Mar 26 '13 at 22:26
  • @GSee, thanks again. I had read those discussions you link to, but when trying to put bits together it never seemed to work. I'll start a new question if I run into problems merging... Cheers for now. – PatrickT Mar 26 '13 at 22:45