2

I'm a complete beginner in R. I want to download historical data about current companies in S&P500 using getSymbols for a few periods. Obviously, some of companies didn't exist in a given period and R stops downloading data for the next tickers. Is there any way to enable getSymbols to simply omit tickers if their data are not existing? It would be much easier to just get the S&P 500 list for that period, but unfortunately it's not free.

Yi Sha
  • 81
  • 7
  • Take a look at the [help file for `try`](http://stat.ethz.ch/R-manual/R-devel/library/base/html/try.html) – nrussell Jan 11 '15 at 21:13
  • Unfortunately it doesn't work. I only get warning without error message. I have no idea why. I've also tried tryCatch but couldn't came up with a good idea how to state condition. – Yi Sha Jan 11 '15 at 22:55
  • Please include your code that is not working. – nrussell Jan 11 '15 at 23:21
  • >try(getSymbols(SiP, from="2001-01-01", to="2007-01-01",env=WoW), silent=TRUE) – Yi Sha Jan 11 '15 at 23:50
  • What is `SiP`? That's not a valid ticker. – nrussell Jan 11 '15 at 23:54
  • That's how I've called my list with tickers. – Yi Sha Jan 12 '15 at 00:32
  • It consists of tickers of every company in current S&P500. The abridged version: SiP=c('AES','GAS','AEE','AEP','CNP', 'CMS','ED','D','DTE','DUK','EIX', 'ETR','EXC','FE','TEG','NEE','NI', 'NU','NRG','PCG','POM','PNW','PPL', 'PEG','SCG','SRE','SO','TE','WEC', 'XEL','T','CTL','FTR','LVLT','VZ', 'WIN','AP','ARG','AA','ATI','AVY', 'BLL','CF','DOW','D','EMN','ECL', 'FMC','FCX','IP','IFF','LYB','MWV', 'MON','MOS','NEM','NUE','OI','PPG') – Yi Sha Jan 12 '15 at 00:43

3 Answers3

1

You can use try within sapply like this:

library(quantmod)
WoW <- new.env()
##
sapply(SiP, function(x){
  try(
    getSymbols(
      x,
      from=as.Date("2001-01-01"),
      to=as.Date("2007-01-01"),
      env=WoW),
    silent=TRUE)
})

Errors will be printed to the console (you could probably mitigate this if desired), but the tickers that do not generate errors will still produce data:

R> ls(WoW)
 [1] "AA"   "AEE"  "AEP"  "AES"  "AP"   "ARG"  "ATI"  "AVY"  "BLL"  "CF"   "CMS"  "CNP"  "CTL"  "D"    "DOW"  "DTE"  "DUK"  "ECL"  "ED"   "EIX" 
[21] "EMN"  "ETR"  "EXC"  "FCX"  "FE"   "FMC"  "FTR"  "GAS"  "IFF"  "IP"   "LVLT" "MON"  "MOS"  "MWV"  "NEE"  "NEM"  "NI"   "NRG"  "NU"   "NUE" 
[41] "OI"   "PCG"  "PEG"  "PNW"  "POM"  "PPG"  "PPL"  "SCG"  "SO"   "SRE"  "T"    "TE"   "TEG"  "VZ"   "WEC"  "WIN"  "XEL" 
##
R> length(ls(WoW))
[1] 57
R> length(SiP)
[1] 59

So it looks like there were issues with 2 of the stocks, as sapply(...) successfully returned data for the other 57.

From here, objects can be accessed within WoW through your preferred method, e.g.

R> with(WoW, chartSeries(ARG))

enter image description here


Data:

SiP=c('AES','GAS','AEE','AEP','CNP', 'CMS','ED','D',
      'DTE','DUK','EIX', 'ETR','EXC','FE','TEG',
      'NEE','NI', 'NU','NRG','PCG','POM','PNW','PPL', 
      'PEG','SCG','SRE','SO','TE','WEC', 'XEL','T',
      'CTL','FTR','LVLT','VZ', 'WIN','AP','ARG',
      'AA','ATI','AVY', 'BLL','CF','DOW','D',
      'EMN','ECL', 'FMC','FCX','IP','IFF','LYB',
      'MWV', 'MON','MOS','NEM','NUE','OI','PPG') 
nrussell
  • 18,382
  • 4
  • 47
  • 60
  • You're very welcome - when you have a chance, please take a few minutes to read through some of the answers to [this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) - the more information you can provide about your problem, the easier it will be for other users to help you out. – nrussell Jan 12 '15 at 01:13
0

The problem is the punctuation in the list of tickers stockSymbols() generates, despite being from Yahoo, yield a 404 from using getSymbols() because yahoo does not use those punctuation in the URLsgetSymbols() tries to scrape.

Example: stockSymbols() retrieves the symbol "AA-P", you try to pass this into getSymbols() and you get 404'd because Yahoo! uses "AA" in the URL, not "AA-P", for this stock, despite having ticker given as "AA-P" from whatever resource stockSymbols() retrieves it from.

I have made some code to clean the list of tickers generated by stockSymbols() so that getSymbols() does not generate an error. This removes preferred and symbols which contain punctuation, so the result is from common stock issues.

library(quantmod)

symbols = stockSymbols()

symbols = symbols[,1]

for (i in seq_along(symbols)) {
    hyph = gregexpr(pattern = "-", symbols[i])
    per = gregexpr(pattern = "[.]", symbols[i])

    if (hyph[[1]][1] > 0 ) {
        symbols[i] = substr(symbols[i], 1, hyph[[1]][1] - 1)

    } else if (per[[1]][1] > 0 ) {
        symbols[i] = substr(symbols[i], 1, per[[1]][1] - 1)
    }
}

symbols = unique(symbols)

here is some code for using getSymbol() to get all stock data and skip 404s

 for (i in seq_along(symbols)){
tryit <- try(getSymbols(symbols[i],from="2016-01-01", src='yahoo'))
    if(inherits(tryit, "try-error")){
        i <- i+1
    }
    else {
    stock = getSymbols(symbols[i], from="2016-01-01", src = "yahoo", auto.assign = FALSE)
    stocks[[i]] = as.data.frame(stock)  
    }
}
Victor Burnett
  • 588
  • 6
  • 10
0

You might try the tidyquant package which takes care of error handling internally. It also doesn't require for-loops or tryCatch statements so it will save you a significant amount of code. The tq_get() function is responsible for getting stock prices. You can use the complete_cases argument to adjust how errors are handled.

Example with complete_cases = TRUE: Automatically removes "bad apples"

library(tidyquant)

# get data with complete_cases = TRUE automatically removes bad apples
c("AAPL", "GOOG", "BAD APPLE", "NFLX") %>%
    tq_get(get = "stock.prices", complete_cases = TRUE)

#> Warning in value[[3L]](cond): Error at BAD APPLE during call to get =
#> 'stock.prices'. Removing BAD APPLE.
#> # A tibble: 7,680 × 8
#>    symbol       date  open  high   low close    volume adjusted
#>     <chr>     <date> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
#> 1    AAPL 2007-01-03 86.29 86.58 81.90 83.80 309579900 10.85709
#> 2    AAPL 2007-01-04 84.05 85.95 83.82 85.66 211815100 11.09807
#> 3    AAPL 2007-01-05 85.77 86.20 84.40 85.05 208685400 11.01904
#> 4    AAPL 2007-01-08 85.96 86.53 85.28 85.47 199276700 11.07345
#> 5    AAPL 2007-01-09 86.45 92.98 85.15 92.57 837324600 11.99333
#> 6    AAPL 2007-01-10 94.75 97.80 93.45 97.00 738220000 12.56728
#> 7    AAPL 2007-01-11 95.94 96.78 95.10 95.80 360063200 12.41180
#> 8    AAPL 2007-01-12 94.59 95.06 93.23 94.62 328172600 12.25892
#> 9    AAPL 2007-01-16 95.68 97.25 95.45 97.10 311019100 12.58023
#> 10   AAPL 2007-01-17 97.56 97.60 94.82 94.95 411565000 12.30168
#> # ... with 7,670 more rows

Example with complete_cases = FALSE: Returns nested data frame.

library(tidyquant)

# get data with complete_cases = FALSE returns a nested data frame
c("AAPL", "GOOG", "BAD APPLE", "NFLX") %>%
    tq_get(get = "stock.prices", complete_cases = FALSE)

#> Warning in value[[3L]](cond): Error at BAD APPLE during call to get =
#> 'stock.prices'.
#> Warning in value[[3L]](cond): Returning as nested data frame.
#> # A tibble: 4 × 2
#>      symbol         stock.prices
#>       <chr>               <list>
#> 1      AAPL <tibble [2,560 × 7]>
#> 2      GOOG <tibble [2,560 × 7]>
#> 3 BAD APPLE            <lgl [1]>
#> 4      NFLX <tibble [2,560 × 7]>

In both cases the user gets a WARNING message. The prudent user will read them and try to determine what the issue is. Most important, the long running script will not fail.

Matt Dancho
  • 6,840
  • 3
  • 35
  • 26