I had to make two minor adjustments to your regex to get the desired output:
$document =~ s{(<jdgdt\s+mdy\=[^>]*>\s*)(?!\s*<jdg>)}{$1<jdg>Opinion by Marvel,<e>J.</e></jdg>\n<taxyr></taxyr>\n<disp></disp>}isg;
Also, to clean up the code, I switched from using /
to using {}
to delimit the regex; that way, you don't need to backslash all the slashes that you actually want there in your replacement.
Explanation of what I changed:
First off, negative lookahead is tricky. What you have to remember is that perl will try to match your expression the maximum amount of times possible. Because you had this initially:
/(<jdgdt\s+mdy\=.*?>\s*)(?!<jdg>)/
What would happen is that in that first clause you'd get this match:
<jdgdt mdy='02/25/2014'>\n<jdg>Opinion by Marvel, <e>J.</e></jdg>
^^^^^^^^^^^^^^^^^^^^^^^^
(this part matched by paren. Note the \n is not matched!)
Perl would consider this a match because after the first parenthesized expression, you have "\n<jdg>
". Well, that doesn't match the expression "<jdg>
" (because of the initial newline), so yay! found a match.
In other words, initially, perl would have the \s*
that you end your parenthesized expression with match the empty string, and therefore it would find a match and you'd end up stuffing things into the first clause that you didn't want. Another way to put it is that because of the freedom to choose what went into \s*
, perl would choose the amount that allowed the expression as a whole to match. (and would fill \s*
with the empty string for the first docket record, and newline for the second docket record)
To get perl to never find a match on the first docket record, I repeated the \s*
in the negative lookahead as well. That way, no choice of what to put in \s*
could make the expression as a whole match on the initial docket record, and perl had to give up and move to the second docket record.
But then there was a second problem! Remember how I said perl was really aggressive about finding matches anywhere it could? Well, next perl would expand your mdy\=.*?>
bit to still find a result in the first docket record. After I added \s*
to the negative lookahead, the first docket was still matching (but in a different spot) with:
<jdgdt mdy='02/25/2014'>\n<jdg>Opinion by Marvel, <e>J.</e></jdg>
^^^^^^^^^^^???????????????????^
(Underlined part matched by paren. ? denotes the bit matched by .*?)
See how perl expanded your .*?
way beyond what you had intended? You'd intended that bit to match only stuff up to the first >
character, but perl will stretch your non-greedy matches as far as necessary so that the whole pattern matches. This time, it stretched your .*?
to cover the >
that closed the <jdg>
tag so that it could find a spot where the negative lookahead didn't block the match.
To keep perl from stretching your .*?
pattern that far, I replaced .*?
with [^>]*
, which is really what you meant.
After these two changes, we then only found a match in the second docket record, as initially desired.