0

I have a stock market chart for spot market and forwards market for price. Now I want to compare the price of them at a same day. But I found the forwards market has 1826 days and spot market has 1822 days observations. I do not know what are the days of the 4 days that are missing in spot market. Can you guys give me syntax on how to find those 4 more observations in forwards market that are missing in spot market? I can't just delete random 4 observations. The days should be matching.

Thank you in advance to everyone.

Have a great day!

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
S. Jay
  • 141
  • 2
  • 10
  • 2
    [please give a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). how are your data stored? as vectors? a `data.frame`? – MichaelChirico Dec 28 '17 at 22:03
  • It is data.frame. there is one data frame with forwards price and date from 2015/12/31 to 2011/1/1. The same goes with spot price. But forwards has 1826 observations which is 4 more than spot. I do not know where the 4 observations come from.. – S. Jay Dec 29 '17 at 15:18
  • join the two data frames using the date. there may be more than 4 that don't match. – MichaelChirico Dec 29 '17 at 17:22
  • That's where I am stuck. I tried to put them together, but because the number of observations are different, I cant put them together.. I tried those two advice down there, and it did not really work. Do you have any suggestion? Thank you very much! – S. Jay Dec 30 '17 at 02:29
  • sorry I don't have a computer available. just search for how to join two data.frames and update your question to reflect where you're still stuck after trying to replicate any of several similar. questions – MichaelChirico Dec 30 '17 at 08:16

3 Answers3

1

consider setdiff: something like

setdiff(forward, spot)

where forward and spot are the respective days should give you the days that differ between the two.

jrlewi
  • 486
  • 3
  • 8
0

Something like this will delete the non matching rows:

forwards <- forwards[ forwards$date %in% spot$date, ]
rosscova
  • 5,430
  • 1
  • 22
  • 35
0

I prefer data.table. See Getting Started for an intro to the syntax.

Setup

library(data.table)
# use setDT(forwards) on your own object to convert an existing data.frame
set.seed(20384)
all_dates = seq.Date(as.Date('2011-01-01'), as.Date('2015-12-31'), by = 'day')
TT = length(all_dates)
forward = data.table(
  date = all_dates,
  price_forward = rnorm(TT)
)
forward
#             date price_forward
#    1: 2011-01-01    -0.0969564
#    2: 2011-01-02    -0.1079899
#    3: 2011-01-03     1.9454087
#    4: 2011-01-04     0.5079781
#    5: 2011-01-05     0.2201317
#   ---                         
# 1822: 2015-12-27    -0.3674510
# 1823: 2015-12-28    -1.5389197
# 1824: 2015-12-29    -0.8461961
# 1825: 2015-12-30    -0.7018287
# 1826: 2015-12-31    -0.5643040
spot = data.table(
  # simulate removing 4 of the dates at random
  date = sample(all_dates, TT - 4L),
  price_spot = rnorm(TT - 4),
  # setting key is not necessary, but it will sort the data
  key = 'date'
)
spot
#             date  price_spot
#    1: 2011-01-01  0.33803547
#    2: 2011-01-02 -1.21756020
#    3: 2011-01-03  0.13199130
#    4: 2011-01-04 -0.64201342
#    5: 2011-01-05 -0.08061704
#   ---                       
# 1818: 2015-12-27  1.83826974
# 1819: 2015-12-28  0.22838840
# 1820: 2015-12-29 -0.93258147
# 1821: 2015-12-30 -1.20209606
# 1822: 2015-12-31 -1.80698627

Investigation

Identifying the missing dates is easy. e.g.

setdiff(forward$date, spot$date)
# [1] 15212 15598 16188 16752
# strangely, setdiff strips the Date of its human-readable label
#   and converted the dates to integers (read ?Date for more);
#   we can recover the useful version with (note that this version
#   is inefficient since we convert to character every Date object,
#   when we only care about the character version of a few -- we 
#   would do better to convert the result of setdiff):
setdiff(paste(forward$date), paste(spot$date))
# [1] "2011-08-26" "2012-09-15" "2014-04-28" "2015-11-13"

But that's probably not all you'd like to do; joining the tables may be more useful:

prices = merge(forward, spot, all.x = TRUE, by = 'date')
# once merged, we can use is.na to identify the missing dates:
prices[is.na(price_spot)]
#          date price_forward price_spot
# 1: 2011-08-26   0.008345504         NA
# 2: 2012-09-15  -0.966410632         NA
# 3: 2014-04-28  -1.600574836         NA
# 4: 2015-11-13   1.549928470         NA
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • Dear Chirico, I have done setdiff on my dataset, but it does show weird stuff. It is showing like [1] 1451520000 1451433600 1451347200 1451260800 1451174400 1451088000 1451001600 1450915200 1450828800 1450742400 1450656000 1450569600 1450483200 1450396800 [15] 1450310400 1450224000 1450137600 1450051200 1449964800 1449878400 1449792000 1449705600 1449619200 1449532800 1449446400 1449360000 1449273600 1449187200 and so on.. I do not know what the problem is. If possible, I can send you my data.. How can I do that? Thank you for help very much! – S. Jay Jan 03 '18 at 12:55
  • Please read my answer more carefully. The only difference is your data appears to be stored as `POSIXct` instead of `Date` – MichaelChirico Jan 03 '18 at 14:55