0

I want to create a matrix of stockdata from n number of companies from a ticker list, though im struggling with appending them horizontally, it only works to append them vertically. Also other functions like merge or rbind which i have tried, but they cannot work with the variables parsed as a string, so the hard part here is that i want to append n variables which are retrieved from the tickerlist which has n number of stocks. Other suggestions are welcome to get the same result. Stocklist data:

> dput(stockslist)
structure(list(V1 = c("AMD", "MSFT", "SBUX", "IBM", "AAPL", "GSPC", 
"AMZN")), .Names = "V1", class = "data.frame", row.names = c(NA, 
-7L))

code:

library(quantmod)
library(tseries)
library(plyr)
library(PortfolioAnalytics)
library(PerformanceAnalytics)
library(zoo)
library(plotly)

tickerlist <- "sp500.csv"  #CSV containing tickers on rows
stockslist <- read.csv("sp500.csv", header = FALSE, stringsAsFactors = F)
nrstocks = length(stockslist[,1]) #The number of stocks to download
maxretryattempts <- 5 #If there is an error downloading a price how many 
times to retry
startDate = as.Date("2010-01-13")

for (i in 1:nrstocks) {
  stockdata <- getSymbols(c(stockslist[i,1]), src = "yahoo", from = 
startDate)
  # pick 6th column of the ith stock
  write.table((eval(parse(text=paste(stockslist[i,1]))))[,6], file = 
"test.csv", append = TRUE, row.names=F)

}
  • 2
    I don't understand why you would need to do `eval(parse(...))` in your loop. Can you show a sample of `stockslist` so we can better understand your structure? Your code does not indicate where you want to append rows/columns. Can you make this question [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a sample of your data (as already requested) and an idea of what your output would look like? – r2evans May 17 '17 at 00:09
  • I cannot upload a file here, but it is just these tickers, one on each row saved as the sp500.csv AMD MSFT SBUX IBM AAPL GSPC AMZN – Hein Verbaarschot May 17 '17 at 00:19
  • 1
    You don't have to upload a file (in fact, I discourage it, typically unnecessary). If you read the link in my previous comment, there are suggestions for providing sample data through the use of `dput` or `read.table(text = ...)`; both work well. – r2evans May 17 '17 at 00:22
  • Further, you talk about "appending" but nothing in your code suggests you do anything other than download data and save to CSV. Other than the really-confusing `eval(parse(...))` part of your code, there doesn't appear to be a glaring problem with what you've provided. I wonder if this would be a good example (at some point) for suggesting [lists of dataframes](http://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207), a more elegant way to load and process multiple data.frames/matrices. Once in a list, `rbind` works great. – r2evans May 17 '17 at 00:35
  • CSVs are linear, left to right, top to bottom, so you cannot accrete additional columns by writing to the same file multiple times (except maybe in the special case of a one-row table..?). Make a big table in R first with `cbind` and then call write.table once. – Frank May 17 '17 at 00:35
  • The eval(parse(...)) is to input the stocklists as strings, else the list will simply use the text AMD, MSFT etc, when i use the eval(parse(...)) it uses the variables created by getsymbols. the problem of cbind is that my number of lists is unknown, i want to simply input a list of like around 1000 tickers so it first creates 1000 df's. – Hein Verbaarschot May 17 '17 at 00:42

1 Answers1

1

This is exactly a great opportunity to talk about lists of dataframes. Having said that ...

Side bar: I really don't like side-effects. getSymbols defaults to using side-effect to saving the data into the parent frame/environment, and though this may be fine for most uses, I prefer functional methods. Luckily, using auto.assign=FALSE returns its behavior to within my bounds of comfort.

library(quantmod)
stocklist <- c("AMD", "MSFT")
startDate <- as.Date("2010-01-13")

dat <- sapply(stocklist, getSymbols, src = "google", from = startDate, auto.assign = FALSE,
              simplify = FALSE)
str(dat)
# List of 2
#  $ AMD :An 'xts' object on 2010-01-13/2017-05-16 containing:
#   Data: num [1:1846, 1:5] 8.71 9.18 9.13 8.84 8.98 9.01 8.55 8.01 8.03 8.03 ...
#  - attr(*, "dimnames")=List of 2
#   ..$ : NULL
#   ..$ : chr [1:5] "AMD.Open" "AMD.High" "AMD.Low" "AMD.Close" ...
#   Indexed by objects of class: [Date] TZ: UTC
#   xts Attributes:  
# List of 2
#   ..$ src    : chr "google"
#   ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"
#  $ MSFT:An 'xts' object on 2010-01-13/2017-05-16 containing:
#   Data: num [1:1847, 1:5] 30.3 30.3 31.1 30.8 30.8 ...
#  - attr(*, "dimnames")=List of 2
#   ..$ : NULL
#   ..$ : chr [1:5] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
#   Indexed by objects of class: [Date] TZ: UTC
#   xts Attributes:  
# List of 2
#   ..$ src    : chr "google"
#   ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"

Though I only did two symbols, it should work for many more without problem. Also, I shifted to using Google since Yahoo was asking for authentication.

You used write.csv(...), realize that you will lose the timestamp for each datum, since the CSV will look something like:

"AMD.Open","AMD.High","AMD.Low","AMD.Close","AMD.Volume"
8.71,9.2,8.55,9.15,32741845
9.18,9.26,8.92,9,22658744
9.13,9.19,8.8,8.84,34344763
8.84,9.21,8.84,9.01,24875646

Using "AMD" as an example, consider:

write.csv(as.data.frame(AMD), file="AMD.csv", row.names = TRUE)
head(read.csv("~/Downloads/AMD.csv", row.names = 1))
#            AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume
# 2010-01-13     8.71     9.20    8.55      9.15   32741845
# 2010-01-14     9.18     9.26    8.92      9.00   22658744
# 2010-01-15     9.13     9.19    8.80      8.84   34344763
# 2010-01-19     8.84     9.21    8.84      9.01   24875646
# 2010-01-20     8.98     9.00    8.76      8.87   22813520
# 2010-01-21     9.01     9.10    8.77      8.99   37888647

To save all of them at once:

ign <- mapply(function(x, fn) write.csv(as.data.frame(x), file = fn, row.names = TRUE),
              dat, names(dat))

There are other ways to store your data such as Rdata files (save()).

It is not clear to me if you are intending to append them as additional columns (i.e., cbind behavior) or as rows (rbind). Between the two, I tend towards "rows", but I'll start with "columns" first.

"Appending" by column

This may be appropriate if you want day-by-day ticker comparisons (though there are arguably better ways to prepare for this). You'll run into problems, since they have (and most likely will have) different numbers of rows:

sapply(dat, nrow)
#  AMD MSFT 
# 1846 1847 

In this case, you might want to join based on the dates (row names). To do this well, you should probably convert the row names (dates) to a column and merge on that column:

dat2 <- lapply(dat, function(x) {
  x <- as.data.frame(x)
  x$date <- rownames(x)
  rownames(x) <- NULL
  x
})
datwide <- Reduce(function(a, b) merge(a, b, by = "date", all = TRUE), dat2)

As a simple demonstration, remembering that there is one more row in "MSFT" than in "AMD", we can find that row and prove that things are still looking alright with:

which(! complete.cases(datwide))
# [1] 1251
datwide[1251 + -2:2,]
#            date AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
# 1249 2014-12-30     2.64     2.70    2.63      2.63    7783709     47.44     47.62    46.84      47.02    16384692
# 1250 2014-12-31     2.64     2.70    2.64      2.67   11177917     46.73     47.44    46.45      46.45    21552450
# 1251 2015-01-02       NA       NA      NA        NA         NA     46.66     47.42    46.54      46.76    27913852
# 1252 2015-01-05     2.67     2.70    2.64      2.66    8878176     46.37     46.73    46.25      46.32    39673865
# 1253 2015-01-06     2.65     2.66    2.55      2.63   13916645     46.38     46.75    45.54      45.65    36447854

"Appending" by row

getSymbols names the columns unique to the ticker, a slight frustration. Additionally, since we'll be discarding the column names, we should preserve the symbol name in the data.

dat3 <- lapply(dat, function(x) {
  ticker <- gsub("\\..*", "", colnames(x)[1])
  colnames(x) <- gsub(".*\\.", "", colnames(x))
  x <- as.data.frame(x)
  x$date <- rownames(x)
  x$symbol <- ticker
  rownames(x) <- NULL
  x
}) # can also be accomplished with mapply(..., dat, names(dat))
datlong <- Reduce(function(a, b) rbind(a, b, make.row.names = FALSE), dat3)

head(datlong)
#   Open High  Low Close   Volume       date symbol
# 1 8.71 9.20 8.55  9.15 32741845 2010-01-13    AMD
# 2 9.18 9.26 8.92  9.00 22658744 2010-01-14    AMD
# 3 9.13 9.19 8.80  8.84 34344763 2010-01-15    AMD
# 4 8.84 9.21 8.84  9.01 24875646 2010-01-19    AMD
# 5 8.98 9.00 8.76  8.87 22813520 2010-01-20    AMD
# 6 9.01 9.10 8.77  8.99 37888647 2010-01-21    AMD
nrow(datlong)
# [1] 3693
Community
  • 1
  • 1
r2evans
  • 141,215
  • 6
  • 77
  • 149