Here's a small but functional snippet of Perl code:
my $content = qq{<img src='h};
if ($content =~ m{src=(?!('*)http://)}) {
print "Match '$1'\n";
}
else {
print "No match\n";
}
It prints
Match '''
That is regex ('*) inside negative look ahead has indeed been captured and contains '.
However if I replace the first line with
my $content = qq{<img src='i};
the script prints
Match ''
meaning the ' has not been captured despite the fact the entire regex matched.
Can anybody explain what's the difference and how can I make it so that ' is always captured (this is of course a simplification of a real case).
Thanks in advance
Addendum
Now this is the whole story for raina77ow. The idea is to replace the contents of the src attribute in the img tag. The following rules apply:
- If contents starts with ' it must end with '.
- If contents starts with " it must end with ".
- Contents can be unquoted.
- If contents (after possible quote) starts with http:// it should be left intact, other wise the last component of URL (image file name) must be kept and the preceding part must be replaced with smth.
Originally I wanted to use the following regex (which is practically the same you suggested)
$content =~ s{<\s*img\s+(.*?)src\s*=\s*(["']*)(?!http://).*?([^/"']+)\2(\s+[^>]+)*>}
{'<img ' . $1 . 'src="' . 'SMTH' . $3 . '"' . $4 . '>'}sgie;
but for some reason it matches the string
[img src='http://qq.com/img.gif' /]
(angle brackets are replaced with square ones).
although it should not because ' is followed by http://. Using
$content =~ s{<\s*img\s+(.*?)src\s*=\s*(["'])*(?!http://).*?([^/"']+)\2(\s+[^>]+)*>}
{'<img ' . $1 . 'src="' . 'SMTH' . $3 . '"' . $4 . '>'}sgie;
is also inappropriate as in this case \2 will not match empty string.
Not being able to fix that I decided to look for some workaround. Alas...