In a multiline string, in each line, I want to delete everything from the first unescaped percent sign to the end of the line; with one exception. If the unescaped percent sign occurs in the following position: \d\d:\d\d%:\d\d
, then I want to leave it alone.
(The string is LaTeX / TeX code and the percent sign denotes a comment. I want to treat a comment inside an HH:MM:SS string as a special case, where seconds were commented out of a time string.)
The code below manages almost to do it:
- it uses one negative lookbehind to leave
\%
alone - it uses "ungreedy" to match the first, not last,
%
- it uses another negative lookbehind to skip
\d\d:\d\d%
- BUT it fails to differentiate between
\d\d:\d\d%anything
and\d\d:\d\d%\d\d
, skipping both. - My attempts at adding negative lookahead do not help. Is there a way to do this?
#!/usr/bin/perl
use strict; use warnings;
my $string = 'for 10\% and %delete-me
for 10\% and 2021-03-09 Tue 02:59%:02 NO DELETE %delete-me
for 10\% and 2021-03-09 Tue 04:09%anything %delete-me
for 10 percent%delete-me';
print "original string:\n";
print "$string<<\n";
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\\)
)
(\%.*)
$/${1}/mgx;
print "\ndelete after any unescaped %\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(\%.*)
$/${1}/mgx;
print "\nexception for preceding HH:MM\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(!?:\d\d)
(\%.*)
$/${1}/mgx;
print "\nattempt to add negative lookahead\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
# attempt to add negative lookahead
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(\%.*)
(!?:\d\d)
$/${1}/mgx;
print "\nattempt to add negative lookahead\n";
print "$tochange<<\n";
}