2

I am trying to get historical prices for VIX futures by downloading all the CSV files on this page (http://cfe.cboe.com/Products/historicalVIX.aspx). Here is the code I am using to do this:

library(XML)

#Extract all links for url
url <- "http://cfe.cboe.com/Products/historicalVIX.aspx"
doc <- htmlParse(url)
links <- xpathSApply(doc, "//a/@href")
free(doc)

#Filter out URLs ending with csv and complete the link.
links <- links[substr(links, nchar(links) - 2, nchar(links)) == "csv"]
links <- paste("http://cfe.cboe.com", links, sep="")

#Peform read.csv on each url in links, skipping the first two URLs as they are not relevant.
c <- lapply(links[-(1:2)], read.csv, header = TRUE)

I get the error:

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names

Upon further investigation, I realize this is because some of the CSV files are formatted differently. If I load the URL links[9] manually, I see that the first row has this disclaimer:

CFE data is compiled for the .......use of CFE data is subject to the Terms and Conditions of CBOE's Websites.

Most of the other files (e.g.links[8] and links[10]) are fine so it seems this has been randomly inserted. Is there some R magic that can be done to handle this?

Thank you.

mchangun
  • 9,814
  • 18
  • 71
  • 101

1 Answers1

4

I have a getSymbols.cfe method in my qmao package (for the getSymbols function in quantmod package) that will make this a lot easier.

#install.packages('qmao', repos='http://r-forge.r-project.org')
library(qmao)

This is from the examples section of ?getSymbols.cfe (please read the help page as the function has a few arguments that you may want to be different than the defaults)

getSymbols(c("VX_U11", "VX_V11"),src='cfe') 
#all contracts expiring in 2010 and 2011.
getSymbols("VX",Months=1:12,Years=2010:2011,src='cfe')
#getSymbols("VX",Months=1:12,Years=10:11,src='cfe') #same

And it's not just for VIX

getSymbols(c("VM","GV"),src='cfe') #The mini-VIX and Gold vol contracts expiring this month

If you're not familiar with getSymbols, by default it stores the data in your .GlobalEnv and return the name of the object that was saved.

> getSymbols("VX_Z12", src='cfe')
[1] "VX_Z12"

> tail(VX_Z12)
           VX_Z12.Open VX_Z12.High VX_Z12.Low VX_Z12.Close VX_Z12.Settle VX_Z12.Change VX_Z12.Volume VX_Z12.EFP VX_Z12.OpInt
2012-10-26       19.20       19.35      18.62        18.87          18.9           0.0         22043         15        71114
2012-10-31       18.55       19.50      18.51        19.46          19.5           0.6         46405        319        89674
2012-11-01       19.35       19.35      17.75        17.87          17.9          -1.6         40609       2046        95720
2012-11-02       17.90       18.65      17.55        18.57          18.6           0.7         42592       1155       100691
2012-11-05       18.60       20.15      18.43        18.86          18.9           0.3         28136        110       102746
2012-11-06       18.70       18.85      17.75        18.06          18.1          -0.8         35599        851       110638

Edit

I see now that I did not answer your question, but rather pointed you to another way to get the same error! A simple way to make your code work, is to make a wrapper for read.csv that uses readLines to see if the first row contains the disclaimer; if it does, skip the the first row, otherwise use read.csv as normal.

myRead.csv <- function(x, ...) {  
  if (grepl("Terms and Conditions", readLines(x, 1))) { #is the first row the disclaimer?
    read.csv(x, skip=1, ...)  
  } else read.csv(x, ...)
}
L <- lapply(links[-(1:2)], myRead.csv, header = TRUE)

I also applied that patch to getSymbols.cfe. You can get the latest version of qmao (1.3.11) using svn checkout (see this post if you need help with that), or, you can wait until R-Forge builds it for you which usually happens pretty quickly, but could take up to a couple of days.

Community
  • 1
  • 1
GSee
  • 48,880
  • 13
  • 125
  • 145
  • Thanks for that. No need for me to reinvent the wheel. – mchangun Nov 08 '12 at 04:12
  • Thanks for pointing out that VX_N13 has a disclaimer on the first row. I will try to patch to deal with that soon. – GSee Nov 08 '12 at 04:34
  • @GSee Is there something like getQuote.cfe available. I am looking around and can not find it. I am asking since using getSmbols.cfe provides data that comes a bit to late "PLEASE NOTE: CFE Data does not become available until approximately 10:00 a.m. C.T. the following business day." The question is where to get (near) real time data. One obvious place for me is Interactive Brokers... Any other ideas (google, yahoo, ...)? Thnx. – Samo Oct 13 '13 at 23:57