0

I'm relatively new to regex but need to build a query that will search through a timeseries and find recurring transactions, ones that are recurring every x number of days.

x is predefined

For example:

If im looking for a pattern repeating every 9 days

data1 <- c(10.10,0,0,0,0,0,0,0,10.10,0,0,0,0,0,0,0,10.10,0,0,0,0,0,0,0,10.10)

Output: 10.10

If im looking for a pattern repeating every 14 days

data1 <- c(2000,0,0,0,9,0,0,10,0,0,9,0,0,0,0,2000,0,0,0,0,0,0,10.10,0,0,0,10.10,0,0,0,2000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2000)

Output: 2000

Numbers in between can be anything.

DataDancer
  • 175
  • 1
  • 2
  • 11
  • 2
    Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with some sample data and the expected output. – Molx Jul 07 '15 at 00:18
  • i've tried using m<-gregexpr("[0-9]+",s) n<-regmatches(s,m) but it only gives me the number of times a single value occurs, without taking into account the transactions/time between them. – DataDancer Jul 07 '15 at 00:51
  • So are you going to try all starting with 1-9 days for a repeating pattern, or do you absolutely know its every 9 days? Also, you show decimal integers. Is that something that is fixed in concrete? I assume the `,` comma is a delimiter for the day, right? Does it delimit decimal numbers only? If the data is long, does it _have_ to run in increments of 9, where the last is less than or equal to 9, before the end of data? –  Jul 07 '15 at 01:01
  • 1
    Is `data1` a numeric vector or a one-element string? Your code doesn't work, you can't create the object like that. – Molx Jul 07 '15 at 01:05
  • Yeah, it would have to be a _string_ if using a regex. And if not a string, then equivalence like 0, or 0.0 is considered equal. –  Jul 07 '15 at 01:06
  • They start out as numeric ts objects but i make them into strings. In relation to what numbers exist. any number , integer or real (2 decimal places only). – DataDancer Jul 07 '15 at 01:22
  • My overarching goal is to return any value that is recurring in the series and identify its recurrence/periodicity. i can not use statistical techniques like fourier analysis due to small sample size limitations – DataDancer Jul 07 '15 at 01:23

1 Answers1

1
interval <- 3
vector <- c(10,1,0,10,0,0,10,0,0,10)
for(i in 1:interval) {
  if(sd(vector[seq(i,length(vector),interval)])==0) {
    print(vector[i])
  }
}

This is a loop though, so it won't be the most efficient way of doing things. To be more of a discovery function, returning the value and the interval to get the value, here is a function.

vector <- c(10,1,0,10,0,0,10,0,0,10)
matches <- find_patterns(vector,seq(2,3))

find_patterns <- function (vector, intervals) {
    matches <- matrix(c(NA, NA), nrow=1, ncol=2)
    for(interval in intervals) {
        for(i in 1:interval) {
            if(sd(vector[seq(i,length(vector),interval)])==0) {
                if(is.na(matches[1,1])) {
                    matches[1,] <- c(vector[i],interval)
                } else {
                    matches <- rbind(matches,c(vector[i],interval))
                }
            }
        }   
    }
    return(matches)
}
Matt Sandy
  • 187
  • 1
  • 8
  • Thanks for sharing the above function. It works my constant series. Just a quick question; if the pattern comes in and out over time. i.e. there is a 7 days recurring pattern for 6 months but then it stops and then after a few month months starts up again. In this case would you apply a windowing function and run the above inside the window? – DataDancer Jul 07 '15 at 03:30
  • This checks against the entire vector. You might be able to swap out the standard deviation function for another that checks a percentage of the same values. Depending on the data you might be able to get by with changing the 0 for standard deviation to another value. It really all depends on the data set. – Matt Sandy Jul 07 '15 at 05:02