2

I have a function comp(char1, char2, stock). It scans a large excel document of stock data of different characteristics, goes through specified sheets in char1 and char2 for a specific stock (which are in rows and are indexed, so stock = 1 will give the first stock), and returns a new data frame with data for a particular stock's char1 in one column and char2 in another column.

I want to be able to perform this for 100 stocks and combine all of the data frames into one large data frame with two columns: char1 and char2. It doesn't matter if the stocks names are given, I just want all of the data from the char1 and char2 for all 100 stocks in one large data frame of two columns.

I'm a beginner in R, but what I think I need to do is somehow repeat the function 100 times, each time with the input of stock increasing by 1, and then somehow combine all of the data frames (I think with rbind), but I'm not entirely positive.

Here is an example:

AAPL <- comp(char1 = 'RETURNS', char2 = 'EPS' , stock = 1) will make the data frame

  RETURNS      EPS
1   17.91 -31.3504
2   17.10 -33.1423
3   22.80 -33.1423
4   22.60 -38.0202

and

DIS <- comp(char1 = 'RETURNS', char2 = 'EPS' , stock = 2)

will make the data frame

  RETURNS      EPS
1   63.01  17.4997
2   65.32  54.2022
3   58.26  20.6345
4   66.53  20.6345

so if I do rbind(AAPL, DIS) I get a data frame, of a length of 8, that combines both AAPL and DIS. However, to do this for each one is very tedious, so I want to find a way to have it automated.

josliber
  • 43,891
  • 12
  • 98
  • 133
D Gordon
  • 47
  • 4
  • There are several ways you can do that, `plyr` package, `apply` functions and then some binding etc. A piece of your desired output and code would be helpful. Helps users to help you. – SabDeM Jun 04 '15 at 18:51
  • 1
    Please post a minimal working example for us to help you. You probably want a function that returns a data frame, the `lapply()` function, and the `do.call` to `rbind.data.frame` functions. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – TARehman Jun 04 '15 at 18:55

1 Answers1

3

Something like this should work for you

> stocks <- 1:100
> results <- lapply(stocks, 
                    function(s) comp(char1 = 'RETURNS', 
                                     char2 = 'EPS' , 
                                     stock = s)
                   )
> df <- do.call(rbind, results)

The lapply call generates all the individual data frames, then the do.call binds them into a single large one.

You might want to add the stock ID as an additional column in the lambda function for later processing. You can do this easily enough, but we'll move the lambda out to a named function for clarity

> stocks <- 1:100
> one_stock <- function(s) {
      d <- comp(char1 = 'RETURNS', char2 = 'EPS', stock = s)
      d$stock <- s
      d
  }
> df <- do.call(rbind, lapply(stocks, one_stock))
kdopen
  • 8,032
  • 7
  • 44
  • 52
  • When I run this it works, however, it only creates the data frame for the first 10 stocks and doesn't go through all 100. @kdopen – D Gordon Jun 04 '15 at 19:59
  • I changed the `stocks` variable to be `1:100`. Not sure about the Java exception. What's the traceback? Is it being raised by `comp`? – kdopen Jun 04 '15 at 20:01
  • I am using the XLConnect package to get the info from the Excel file. I tried increasing the memory but that doesn't seem to work either. I'm a beginner so I don't really know the debugging process, but both things I used from XLConnect in comp, as well as lapply are mentioned when look at the traceback @kdopen – D Gordon Jun 04 '15 at 20:07
  • Increasing the memory fixed it, but it took almost 10 minutes for my MacBook Pro with a 2.6 GHz Intel Core i7 and 16 GB of memory to run it. – D Gordon Jun 04 '15 at 20:24
  • That's probably because `comp` is reloading the entire spreadsheet for each stock. Ideally, you'd be better of if `comp` could take a range of stocks and return the total dataframe. Alternatviely, do some preprocessing on the spreadsheet, generate an internal copy, save it as an RDS file, and then work with that. – kdopen Jun 04 '15 at 20:26
  • If I saved it as an RDS file how much faster would it take. I need to do this for many characteristics so making this go faster would be ideal. – D Gordon Jun 04 '15 at 20:50
  • and would saving as an RDS file work if there are over a dozen sheets within the excel file – D Gordon Jun 04 '15 at 21:07
  • 1
    Right now you read the entire file 100 times, looking for one stock at a time. I'm suggesting you invert that - read the file once (I/O is slow) and look for 100 stocks at the same time during that single read. And yes, RDS is much faster to load and save (it's a binary format). But that's a different question – kdopen Jun 04 '15 at 22:39