-2

I've been stuck working on this for a little while now and I find it very hard to believe that this isn't an inbuilt function or that someone hasn't dealt with this before now.

The function I want to run should compare two columns of a dataframe by until they "best correlation is found. The data I am using is from two scientific instruments and their sampling/averaging times differ, which is why I want to shift the data.

date associated with only one element will be adjusted.

if correlation of data + x seconds is > that current correlation 
  increase current
  note increasing
else if correlation of data - x seconds ix > than current correlation
  decrease current date/time
  not decreasing
end if

while correlation of data + x seconds is > than current correlation 
  increase current date/time by x seconds
end while

while correlation of data - seconds is > than current correlation
  decrease current date/time by x seconds
end while

If there is a function that will do this great if not I will provide additional info + code

This is what my current code structure is. Date is POXISct 'GMT', Dusttrak is numeric, CO is numeric, color is a number I have created from time to give me a colored time series

I am currently using rcorr to find the correlation but date has been an issue, so I will either need to convert from date to numeric and back afterwards.

  • A reproducible example would be nice! Consider making a dummy dataset. – Remko Duursma Jan 09 '17 at 04:40
  • The code base is quite large at the moment, and I will pull out the required parts if no such solution/function already exists. – CTRoulston Jan 09 '17 at 04:49
  • You want `ccf`, for cross correlation, give a reproducible example for a reproducible solution – jeremycg Jan 09 '17 at 04:50
  • Please don't give an image of your data! It's more work for you (to post it) and for us (since we would have to type it in manually). It is much easier to just do something like `dput(head(mydata, n = 10))`. Along the themes that the other comments are on, I suggest you read about [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and [minimal](http://stackoverflow.com/help/mcve) examples ... it will go a long way to help us help you. – r2evans Jan 09 '17 at 05:01
  • HI Jeyemycg. Thanks so much. CCF is what I have been looking for, for a long time now. It has opened up a whole world of auto correlation and lag plots. if you put this as an answer I will accept it. I was sure for the life of me something like this would have to exist in R!! – CTRoulston Jan 09 '17 at 05:15

1 Answers1

1

let's use synthetic data, as you have only pasted in an image:

set.seed(100)
x = rnorm(100)
y = rnorm(100)

now we use ccf:

z <- ccf(x,y, plot = F) #don't want plot

z is a list with our results, which we can do a little subsetting on to get our max lag:

bestval = which.max(z$acf)
z$lag[bestval] #our lag

16

For the time series, it gets a little harder - if you don't have uniform time steps in your rows, you might have to do some normalisation.

jeremycg
  • 24,657
  • 5
  • 63
  • 74