I prefer data.table
. See Getting Started for an intro to the syntax.
Setup
library(data.table)
# use setDT(forwards) on your own object to convert an existing data.frame
set.seed(20384)
all_dates = seq.Date(as.Date('2011-01-01'), as.Date('2015-12-31'), by = 'day')
TT = length(all_dates)
forward = data.table(
date = all_dates,
price_forward = rnorm(TT)
)
forward
# date price_forward
# 1: 2011-01-01 -0.0969564
# 2: 2011-01-02 -0.1079899
# 3: 2011-01-03 1.9454087
# 4: 2011-01-04 0.5079781
# 5: 2011-01-05 0.2201317
# ---
# 1822: 2015-12-27 -0.3674510
# 1823: 2015-12-28 -1.5389197
# 1824: 2015-12-29 -0.8461961
# 1825: 2015-12-30 -0.7018287
# 1826: 2015-12-31 -0.5643040
spot = data.table(
# simulate removing 4 of the dates at random
date = sample(all_dates, TT - 4L),
price_spot = rnorm(TT - 4),
# setting key is not necessary, but it will sort the data
key = 'date'
)
spot
# date price_spot
# 1: 2011-01-01 0.33803547
# 2: 2011-01-02 -1.21756020
# 3: 2011-01-03 0.13199130
# 4: 2011-01-04 -0.64201342
# 5: 2011-01-05 -0.08061704
# ---
# 1818: 2015-12-27 1.83826974
# 1819: 2015-12-28 0.22838840
# 1820: 2015-12-29 -0.93258147
# 1821: 2015-12-30 -1.20209606
# 1822: 2015-12-31 -1.80698627
Investigation
Identifying the missing dates is easy. e.g.
setdiff(forward$date, spot$date)
# [1] 15212 15598 16188 16752
# strangely, setdiff strips the Date of its human-readable label
# and converted the dates to integers (read ?Date for more);
# we can recover the useful version with (note that this version
# is inefficient since we convert to character every Date object,
# when we only care about the character version of a few -- we
# would do better to convert the result of setdiff):
setdiff(paste(forward$date), paste(spot$date))
# [1] "2011-08-26" "2012-09-15" "2014-04-28" "2015-11-13"
But that's probably not all you'd like to do; joining the tables may be more useful:
prices = merge(forward, spot, all.x = TRUE, by = 'date')
# once merged, we can use is.na to identify the missing dates:
prices[is.na(price_spot)]
# date price_forward price_spot
# 1: 2011-08-26 0.008345504 NA
# 2: 2012-09-15 -0.966410632 NA
# 3: 2014-04-28 -1.600574836 NA
# 4: 2015-11-13 1.549928470 NA