this is my first question here so please potentially excuse some beginners mistakes in asking if any.
My problem in R is the following:
- I have a large dataset (data.table object data_op) with appx. 4.5 million observations over 15 years, which means that I have several data points on each day. The relevant entries each day are (option) strike price and the current date.
- Additionally, I have a second object, data_sec, containing one (closing) price for each of those days, i.e. the vector is of course much smaller but covers the same range.
The relevant part of the data sets look like this: data_op:
date
04/01/1996
04/01/1996
04/01/1996
...
29/08/2014
29/08/2014
data_sec:
date close
04/01/1996 617.70
05/01/1996 616.71
08/01/1996 618.46
... ...
28/08/2014 1996.74
29/08/2014 2003.37
Now, for further calculations, I need to read the closing price from the second vector for each of the dates in the first. Unfortunately, something like
data_op$new <- data_sec$close[data_sec$date == data_op$date]
does not work, since R correctly tells me that 'longer object length is not a multiple of shorter object length'. Therefore, I use the for-loop workaround:
for (i in 1:length(data_op$date)){
data_op$new[i] <- data_sec$close[data_sec$date == data_op$date[i]]}
However, this takes extremely long to execute (I actually have to manually stop execution sometimes). So my question is if there is a faster/more efficient way in achieving this. I thought this should be quite straight-forward but could not find a post about such vector indexing without a loop.
I hope I made my problem clear. Any help is very much appreciated!
Thanks!