6

An MRE is pasted below

MRE

date<-c('2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-01','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02','2001-01-02')
time<-c('07:00:00 GMT','08:00:00 GMT','09:00:00 GMT','10:00:00 GMT','11:00:00 GMT','12:00:00 GMT','13:00:00 GMT','14:00:00 GMT','15:00:00 GMT','16:00:00 GMT','17:00:00 GMT', '18:00:00 GMT','19:00:00 GMT','20:00:00 GMT','21:00:00 GMT','22:00:00 GMT','23:00:00 GMT','00:00:00 GMT', '01:00:00 GMT','02:00:00 GMT','03:00:00 GMT','04:00:00 GMT','05:00:00 GMT','06:00:00 GMT','07:00:00 GMT','08:00:00 GMT','09:00:00 GMT','10:00:00 GMT','11:00:00 GMT','12:00:00 GMT','13:00:00 GMT','14:00:00 GMT','15:00:00 GMT','16:00:00 GMT','17:00:00 GMT','18:00:00 GMT','19:00:00 GMT','20:00:00 GMT','21:00:00 GMT')
el<-c(0.257,0.687,1.861,3.288, 4.821,6.172,7.048,7.258,6.799,5.654,4.463,3.443,2.704,2.708,3.328,4.23,5.244,5.985,6.317,6.074,5.234,3.981,2.662,1.615,0.88,0.746,1.405,2.527,3.928,5.283,6.517,7.179,7.252,6.625,5.454,4.214,3.144,2.491,2.357)
Time<-as.POSIXct(paste(date, time),tz="GMT")
wave<-data.table(Time, el)
ggplot(wave, aes(wave$Time, wave$el)) + geom_point() + labs(x="time", y="elevation") + geom_hline(aes(yintercept=4))

I have a wave time series and I want to be able to have a function that is able to tell me the frequency and mean/median duration the wave is above a given elevation. In my example I have chosen 4.

I want to interpolate the time when the wave reaches 4 on the rising and falling edges and find the time difference between the two points for each wave.

I can do this with a for loop, but I think that I should be able to do it in data.table much faster. I have 1mil+ points for several locations and do not think a for loop would be efficient.

For the rising wave I want to do something like:

wave[,timeIs4:=ifelse(elev<3 & elev[+1]>4,TRUE,FALSE )]

But instead of TRUE put in my interpolation calculation. I do not know how to access preceding and proceeding values within a data table such as in a for loop i+1 or i-1.

Desired Output

Rising leg I want to interpolate between points 4 and 5; 15 and 16; 29 and 30.

Falling leg I want to interpolate between points 11 and 12; 21 and 22; 36 and 37

Approximate Outcome

Rising      Falling
10:28:00    17:27:00
21:45:00    3:59:00
11:03:00    18:12:00

Then I will be able to subtract Rising from Falling using difftime() to determine the amount of time the water level was above the given elevation.

This will give me the frequency and duration the water is above the given elevation.

Jeff Tilton
  • 1,256
  • 1
  • 14
  • 28
  • 1
    check `shift` function from data.table `1.9.5` version, also you should read about [MRE](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – jangorecki Sep 12 '15 at 20:53
  • 1
    What is your exact desired output. I can help you separate the waves but I'm not sure how you want to proceed further – David Arenburg Sep 12 '15 at 21:43

2 Answers2

6

Here's a possible solution using the devel version from GH. You will need it for both the shift function (as mentioned by @Jan) and fir the new dcast method which accepts expressions. Also, you don't have minutes in your MRE, so not sure where did you get those in your expected output.

Anyway, for starters, we will create an index (we will call it Wave so you will know from which wave # its coming from) that will tell us if the wave is rising or falling using shift. Then, we will dcast on matched values while removing the unmatched using na.omit (you can reorder the column names later if you like using the setcolorder function)

library(data.table) ## V 1.9.5+
dt[elev <= 4 & shift(elev, type = "lead") > 4, Wave := "Rising"]
dt[elev > 4 & shift(elev, type = "lead") <= 4, Wave := "Falling"]
dcast(na.omit(dt), cumsum(Wave == "Rising") ~ Wave, value.var = "time")
#    Wave             Falling              Rising
# 1:    1 2001-01-01 17:00:00 2001-01-01 10:00:00
# 2:    2 2001-01-02 03:00:00 2001-01-01 21:00:00
# 3:    3 2001-01-02 18:00:00 2001-01-02 11:00:00
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • This is great, I need to look more at shift. I plan on getting minutes by interpolating the time wave reaches 4. I am wondering for a large data set that has some possible missing data how you would account for quality control. It seems that there may be some legs that are missing; how do I throw out the other half of the set? I have some ideas, such as using a smaller known clean set and getting the time range and ensuring all rising and falling legs fall within that range (with a little extra buffer). Would this be an appropriate approach? – Jeff Tilton Sep 13 '15 at 16:11
3

Here is another possible idea:

elev = 4    

#a helper function to calculate elapsed time
ff = function(el1, el2, el, time1, time2) 
            time1 + ((el - el1) / (el2 - el1)) * (time2 - time1)    

dif = diff(findInterval(wave$el, c(-Inf, elev, Inf)))
ris = which(dif == 1)  #risings
fal = which(dif == -1)  #fallings

ff(wave$el[ris], wave$el[ris + 1], elev, wave$Time[ris], wave$Time[ris + 1])
#[1] "2001-01-01 10:27:52 GMT" "2001-01-01 21:44:42 GMT" "2001-01-02 11:03:11 GMT"
ff(wave$el[fal], wave$el[fal + 1], elev, wave$Time[fal], wave$Time[fal + 1])
#[1] "2001-01-01 17:27:14 GMT" "2001-01-02 03:59:05 GMT" "2001-01-02 18:12:00 GMT"
alexis_laz
  • 12,884
  • 4
  • 27
  • 37