I'm looking for the accumulation of possibly overlapping matches of a regex (the final goal being to do further searches in the resulting substrings).
I want to skip the matches that have already been "accumulated", while avoiding to make copies with substr
(I might be wrong about avoiding substr
), but the condition that I wrote for it with pos($...) = ...
and a next if $... =~ /.../
doesn't work:
#!/usr/bin/env perl
# user inputs
$regexp = "abc|cba|b";
$string = "_abcbabc_bacba";
$length = length($string);
$result = "0" x $length;
while ( pos($string) < $length and $string =~ /$regexp/go ) {
pos($string) = $-[0] + 1;
next unless ($len = $+[0] - $-[0]);
# The failing condition is here:
# pos($result) = $-[0];
# next if $result =~ /1{$len}/;
substr($result, $-[0], $len) = "1" x $len;
printf "%s\n", $string;
printf "%".$-[0]."s%s\n", "", "^" x $len;
}
printf "%s\n", $result;
By commenting those lines I can get the desired result which is 01111111010111
:
_abcbabc_bacba
^^^
_abcbabc_bacba
^
_abcbabc_bacba
^^^
_abcbabc_bacba
^
_abcbabc_bacba
^^^
_abcbabc_bacba
^
_abcbabc_bacba
^
_abcbabc_bacba
^^^
_abcbabc_bacba
^
01111111010111
But my expected output (with a working condition) would be:
_abcbabc_bacba
^^^
_abcbabc_bacba
^^^
_abcbabc_bacba
^^^
_abcbabc_bacba
^
_abcbabc_bacba
^^^
01111111010111
notes:
for each iteration I print the original string; the
^
just below show the characters that have been matched in the current iteration.the
0
&1
at the end represent the overall result. The characters that have been matched at least once during the process are set to1
.My commented condition is meant to skip the current match when its corresponding characters are already set to
1
in the result.