How to find the largest repeating string with overlap in a line

Question

I have a series of lines such as

my $string = "home test results results-apr-25 results-apr-251.csv";
@str = $string =~ /(\w+)\1+/i;
print "@str";

How do I find the largest repeating string with overlap which are separated by whitespace? In this case I'm looking for the output :

results-apr-25

Something like this [`~(\S+)(?=.*?\1)~gs`](http://regex101.com/r/zR9aH7), you only need to figure out how to get the longest string from the matches. — HamZa, Apr 28 '14 at 11:10

Borodin · Accepted Answer · 2014-04-28T19:43:24.273

It looks like you need the String::LCSS_XS which calculates Longest Common SubStrings. Don't try it's Perl-only twin brother String::LCSS because there are bugs in that one.

use strict;
use warnings;

use String::LCSS_XS;
*lcss = \&String::LCSS_XS::lcss; # Manual import of `lcss`

my $var = 'home test results results-apr-25 results-apr-251.csv';
my @words = split ' ', $var;

my $longest;
my ($first, $second);

for my $i (0 .. $#words) {
  for my $j ($i + 1 .. $#words) {
    my $lcss = lcss(@words[$i,$j]);
    unless ($longest and length $lcss <= length $longest) {
      $longest = $lcss;
      ($first, $second) = @words[$i,$j];
    }
  }
}

printf qq{Longest common substring is "%s" between "%s" and "%s"\n}, $longest, $first, $second;

output

Longest common substring is "results-apr-25" between "results-apr-25" and "results-apr-251.csv"

score 1 · Answer 2 · answered Apr 28 '14 at 11:26

my $var = "home test results results-apr-25 results-apr-251.csv";
my @str = split " ", $var;
my %h;
my $last = pop @str;

while (my $curr = pop @str ) {
        if(($curr =~/^$last/) || $last=~/^$curr/) {
                $h{length($curr)}= $curr ;
        }
        $last = $curr;
}

my $max_key = max(keys %h);
print $h{$max_key},"\n";

score 1 · Answer 3 · edited May 23 '17 at 10:26

If you want to make it without a loop, you will need the /g regex modifier.

This will get you all the repeating string:

my @str = $string =~ /(\S+)(?=\s\1)/ig;

I have replaced \w with \S (in your example, \w doesn't match -), and used a look-ahead: (?=\s\1) means match something that is before \s\1, without matching \s\1 itself—this is required to make sure that the next match attempt starts after the first string, not after the second.

Then, it is simply a matter of extracting the longest string from @str:

my $longest = (sort { length $b <=> length $a } @str)[0];

(Do note that this is a legible but far from being the most efficient way of finding the longest value, but this is the subject of a different question.)

score 0 · Answer 4 · answered Apr 28 '14 at 11:12

0

How about:

my $var = "home test results results-apr-25 results-apr-251.csv";
my $l = length $var;
for (my $i=int($l/2); $i; $i--) {
    if ($var =~ /(\S{$i}).*\1/) {
        say "found: $1";
        last;
    }
}

output:

found:  results-apr-25

answered Apr 28 '14 at 11:12

Toto

89,455
62
89
125

How to find the largest repeating string with overlap in a line

4 Answers4