2

I saw here that you should use drop when passing a (single-column) XTS object to the ccf (cross-correlation) function. (The sample data is quite big, so I put it in a gist)

library(xts)
gist="https://gist.github.com/raw/3291932"
tmp1=dget(file.path(gist,"e620647218626929b4ee370a05aa7748b2f9a32b/tmp1.txt"))
tmp2=dget(file.path(gist,"49b732db3eafa52f96006e3b1bb0be28380f5df0/tmp2.txt"))
ccf(drop(tmp1),drop(tmp2)) #Weird?

I expected a small peak around lag=0, with mostly noise either side. Instead I got a straight line:

ccf on all 400 bars

That was 400 bars. I got the same kind of line on my full data of thousands of bars. But if I use just the tail-end 100 bars of that data I get something closer to what I expected: (50 bars looks even more plausible)

ccf for just the last 100 bars

I'm a bit stumped if this is a ccf bug, a problem with the way I use xts objects, my misunderstanding of what ccf is doing, or I've magically discovered the formula to beat the stock market...

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Darren Cook
  • 27,837
  • 13
  • 117
  • 217
  • @JoshuaUlrich Thanks for editing the code to link it directly to gist; I didn't know that was possible. However I get "cannot open the connection" because of "unsupported URL scheme"; do I need to configure something, or load another package? – Darren Cook Aug 08 '12 at 23:12
  • That's odd. It just worked for me. I'm using R-2.15.1. Perhaps you're using an older version? – Joshua Ulrich Aug 09 '12 at 00:36
  • @JoshuaUlrich That is strange, as I'm also using 2.15.1. I also started R with --vanilla, and get same complaint. – Darren Cook Aug 09 '12 at 00:49
  • [This explains it](http://stackoverflow.com/q/7715723/271616). – Joshua Ulrich Aug 09 '12 at 04:13
  • @JoshuaUlrich Actually I had looked at that, but wasn't sure if it was relevant. Are you on Windows then? (I'm on Linux) – Darren Cook Aug 09 '12 at 05:24
  • I use both (Windows w/--internet2 at work, Ubuntu at home). Feel free to rollback my edit or add a different solution. – Joshua Ulrich Aug 09 '12 at 11:46

1 Answers1

5

Your results aren't surprising, since you're looking at the cross-correlations between stock prices. Prices usually have high serial auto-correlation at several lags.

acf(tmp1)
acf(tmp2)

Most correlation analysis is done on returns, which creates something more like what you seemed to expect:

ccf(drop(diff(tmp1,na.pad=FALSE)),drop(diff(tmp2,na.pad=FALSE)))
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • Thanks Joshua, that does indeed create the kind of noisy chart I was expecting. I don't, _internally_, get why using raw prices gives the regular-looking chart yet... I think I might have to poke around in the source. – Darren Cook Aug 08 '12 at 23:21
  • Darren, imagine a series of stock prices. You would expect something like: $58, $57, $62, $64 ... etc. You would not expect something like: $2, $500, $4, $32, $1, $600. This is because the price at one point in time is relatively close to the price just prior to it in time. So you expect prices close in time to be similar. – Omar May 24 '18 at 20:44