Need explanation of below Negative LookBehind

Question

$string2 = '<tag id="123">123</tag>';

$string2 =~ s/123(?![^><]*>)/456/cg;

I need an explanation about negative lookbehind pattern (?![^><]*>) in the above code.

I like https://www.regular-expressions.info/lookaround.html for explanations of lookarounds. — brian d foy, Jul 29 '23 at 21:31

score 6 · Answer 1 · edited Jul 29 '23 at 21:30

This is a negative lookahead, not a lookbehind. It asserts that the following characters do not match the pattern inside the lookahead.

In this case, (?![^><]*>) means "only match 123 if it is not followed by >, optionally with other characters except > or < in between".

So this regex will match:

123 alone
123 followed by other characters except >
123 followed by < but not >

But it will NOT match 123 followed by >, even with other characters in between. So for example:

123a -> will match and replace
123< -> will match and replace
123b> -> will NOT match

The [^><]* part matches any number of characters except > or <. The > part then asserts that the following character must NOT be >, otherwise it doesn't match.

score 1 · Accepted Answer · answered Jul 29 '23 at 21:41

The code you have is trying to replace the text inside a tag without interfering with the tag itself. There are better ways to do this, and I typically reach for Mojo::DOM:

use v5.10;
use Mojo::DOM;

my $dom = Mojo::DOM->new('<tag id="123">123</tag>');
$dom->at( 'tag' )->child_nodes->[0]->replace( '456' );

say $dom;

This way, you don't have to think about any of the complexity of HTML or XML when you want to modify it. See https://stackoverflow.com/a/4234491/2766176 for fun.

Need explanation of below Negative LookBehind

2 Answers2