reps <- function(s, n) paste(rep(s, n), collapse = "") # repeat s n times
find.string <- function(string, th = 3, len = floor(nchar(string)/th)) {
for(sublen in len:1)
{
for(inlen in 0:sublen)
{
pat <- paste0("((.{", sublen-inlen, "})(.)(.{", inlen, "}))", reps("(\\2.\\4)", th-1))
r <- regexpr(pat, string, perl = TRUE)
if (attr(r, "capture.length")[1] > 0)
{
if (r > 0)
{
substring(string, r, r + attr(r, "capture.length")[1] - 1)
}
}
}
}
}
Why doesn't this code work? Basically, this code will accept input strings as 110111111
and output all the patterns satisfying only one constraint:
Which appear consecutively for at least 3 times.
However, apart from this, it will also output patterns having a jitter of 1 character, i.e. patterns like 110
since it appears consecutively for three times except at the last position. But, this just outputs NULL
. Another example can be of: a0cc0vaaaabaaadbaaabbaa00bvw
. Here, one of the output will be aaaab
.
Edit: the input can be a string containing characters or numbers. Also, the minimum length of a match should be atleast 2. And yes, the matches overlap. Also, the input will be of the form:
find.string("a0cc0vaaaabaaadbaaabbaa00bvw")` or `find.string("110111111")