0

i created an EC2 instance on AWS to run a R-Server. The instance type is "t2.micro". Than i used some code from Joshua Ulrich to get stock data from yahoo with the getSymbols() function.

In "nasdaq.symbols.head" are the first 100 ticker symbols from NASDAQ alphabetically.

My problem is, that executing the getSymbols()-function takes pretty much time. During execution of getSymbols() you can read the following message in the console:

"pausing 1 second between requests for more than 5 symbols"

The problem is i would like to get the data from all NASDAQ stocks, so more than 3500 ticker-symbols. Changing the instance type of EC2 to e.g. t2.2xlarge did not seem to accelerate the performance.

Here is the code i used.

# create environment to load data into
Data <- new.env()
getSymbols(nasdaq.symbols.head, from="2007-01-01", env=Data) 

# calculate returns, merge, and create data.frame (eapply loops over all
# objects in an environment, applies a function, and returns a list)
Returns.nasdaq <- eapply(Data, function(s) ROC(Ad(s), type="discrete"))
Returns.nasdaq.DF <- as.data.frame(do.call(merge, Returns.nasdaq))

# adjust column names are re-order columns
colnames(Returns.nasdaq.DF) <- gsub(".Adjusted","",colnames(Returns.nasdaq.DF))
Returns.nasdaq.DF <- Returns.nasdaq.DF[,nasdaq.symbols.head]

tail(Returns.nasdaq.DF)

Charles1
  • 57
  • 8
  • 1
    `getSymbols` does not run in parallel. The size of the aws instance will not help with this. You could use the package BatchGetSymbols in combination future package to do this. Or write your own parallel query with do.parallel, future or any other parallel package. Note that you might get blocked by yahoo if you come in with too many requests at the same time. – phiver Jul 02 '20 at 14:56
  • Thank you very much! I will try this! ;-) – Charles1 Jul 08 '20 at 18:17

0 Answers0