how can I perform both negative lookahead and negative lookbehind in a single perl regx?

Question

In a multiline string, in each line, I want to delete everything from the first unescaped percent sign to the end of the line; with one exception. If the unescaped percent sign occurs in the following position: \d\d:\d\d%:\d\d, then I want to leave it alone.

(The string is LaTeX / TeX code and the percent sign denotes a comment. I want to treat a comment inside an HH:MM:SS string as a special case, where seconds were commented out of a time string.)

The code below manages almost to do it:

it uses one negative lookbehind to leave \% alone
it uses "ungreedy" to match the first, not last, %
it uses another negative lookbehind to skip \d\d:\d\d%
BUT it fails to differentiate between \d\d:\d\d%anything and \d\d:\d\d%\d\d, skipping both.
My attempts at adding negative lookahead do not help. Is there a way to do this?

#!/usr/bin/perl
use strict; use warnings;

my $string = 'for 10\% and %delete-me
for 10\% and 2021-03-09 Tue 02:59%:02 NO DELETE %delete-me
for 10\% and 2021-03-09 Tue 04:09%anything  %delete-me
for 10 percent%delete-me';

print "original string:\n";
print "$string<<\n";

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\ndelete after any unescaped %\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\nexception for preceding HH:MM\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (!?:\d\d)
        (\%.*)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}


{
    my $tochange = $string;
    # attempt to add negative lookahead
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        (!?:\d\d)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}

The fourth bird · Accepted Answer · 2021-03-09T16:51:34.427

3

You might make use of SKIP FAIL approach:

\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*

\d\d:\d\d%:\d\d(*SKIP)(*FAIL)| Match the pattern that you want to avoid
(?<!\\)%.* Negative lookbehind, assert not \ directly to the left and match % followed by the rest of the line

Regex demo | Perl demo

For example

$tochange =~ s/\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*//g;

edited Mar 09 '21 at 16:51

answered Mar 09 '21 at 16:41

The fourth bird

154,723
16
55
70

how can I perform both negative lookahead and negative lookbehind in a single perl regx?

1 Answers1