I have the following code for finding out a pattern (consecutively repeated substring) in a string, say 0110110110000
. The output patterns are 011
and 110
, since they are both repeated within the string. What changes can be done to the following code?
I'd like to identify substrings that start from any position in a given string, and which repeat for at least a threshold number of times. In the above mentioned string, the threshold is three (th = 3
). The repeated string should be the maximal repeated string. In the above string, 110
and 011
both satisfy these conditions.
Here's my attempt at doing this:
reps <- function(s, n) paste(rep(s, n), collapse = "") # repeat s n times
find.string <- function(string, th = 3, len = floor(nchar(string)/th)) {
for(k in len:1) {
pat <- paste0("(.{", k, "})", reps("\\1", th-1))
r <- regexpr(pat, string, perl = TRUE)
if (attr(r, "capture.length") > 0) break
}
if (r > 0) substring(string, r, r + attr(r, "capture.length")-1) else ""
}