2

I have a series of lines such as

my $string = "home test results results-apr-25 results-apr-251.csv";
@str = $string =~ /(\w+)\1+/i;
print "@str";

How do I find the largest repeating string with overlap which are separated by whitespace? In this case I'm looking for the output :

results-apr-25
scozy
  • 2,511
  • 17
  • 34
Tom Iv
  • 409
  • 1
  • 5
  • 21
  • Something like this [`~(\S+)(?=.*?\1)~gs`](http://regex101.com/r/zR9aH7), you only need to figure out how to get the longest string from the matches. – HamZa Apr 28 '14 at 11:10

4 Answers4

2

It looks like you need the String::LCSS_XS which calculates Longest Common SubStrings. Don't try it's Perl-only twin brother String::LCSS because there are bugs in that one.

use strict;
use warnings;

use String::LCSS_XS;
*lcss = \&String::LCSS_XS::lcss; # Manual import of `lcss`

my $var = 'home test results results-apr-25 results-apr-251.csv';
my @words = split ' ', $var;

my $longest;
my ($first, $second);

for my $i (0 .. $#words) {
  for my $j ($i + 1 .. $#words) {
    my $lcss = lcss(@words[$i,$j]);
    unless ($longest and length $lcss <= length $longest) {
      $longest = $lcss;
      ($first, $second) = @words[$i,$j];
    }
  }
}

printf qq{Longest common substring is "%s" between "%s" and "%s"\n}, $longest, $first, $second;

output

Longest common substring is "results-apr-25" between "results-apr-25" and "results-apr-251.csv"
Borodin
  • 126,100
  • 9
  • 70
  • 144
1
my $var = "home test results results-apr-25 results-apr-251.csv";
my @str = split " ", $var;
my %h;
my $last = pop @str;

while (my $curr = pop @str ) {
        if(($curr =~/^$last/) || $last=~/^$curr/) {
                $h{length($curr)}= $curr ;
        }
        $last = $curr;
}

my $max_key = max(keys %h);
print $h{$max_key},"\n";
David Michael Gang
  • 7,107
  • 8
  • 53
  • 98
1

If you want to make it without a loop, you will need the /g regex modifier.

This will get you all the repeating string:

my @str = $string =~ /(\S+)(?=\s\1)/ig;

I have replaced \w with \S (in your example, \w doesn't match -), and used a look-ahead: (?=\s\1) means match something that is before \s\1, without matching \s\1 itself—this is required to make sure that the next match attempt starts after the first string, not after the second.

Then, it is simply a matter of extracting the longest string from @str:

my $longest = (sort { length $b <=> length $a } @str)[0];

(Do note that this is a legible but far from being the most efficient way of finding the longest value, but this is the subject of a different question.)

Community
  • 1
  • 1
scozy
  • 2,511
  • 17
  • 34
0

How about:

my $var = "home test results results-apr-25 results-apr-251.csv";
my $l = length $var;
for (my $i=int($l/2); $i; $i--) {
    if ($var =~ /(\S{$i}).*\1/) {
        say "found: $1";
        last;
    }
}

output:

found:  results-apr-25
Toto
  • 89,455
  • 62
  • 89
  • 125