Leaving out the optional whitespace and assuming only double-quotes around the attribute values, your first regex is equivalent to this:
'/<meta\s+name="keywords"\s+content="([^"]*?)/i'
If the attributes happen to to be listed in that order, this should match everything up to the opening quote of the content
attribute. Inside the capturing group, [^"]*
is supposed to consume the attribute value, but because you used the U
(ungreedy) flag, it initially consumes nothing, as if were [^"]*?
. And that's the end of the regex, so it reports a successful match.
In other words, your immediate problem is that you left out the closing quote. If you want to match the whole tag, you need to add the closing >
as well:
'/<meta\s+name="keywords"\s+content="([^"]*)">/i'
But as I said, that only works if there are only the two attributes and they're listed in that order, and it doesn't account for single-quoted or unquoted attribute values, or optional whitespace.
Your second regex deals with the ordering problem by using a lookahead match the name
attribute. But it assumes the tag is followed immediately by a line break, which is not something you can count on. You should use the closing >
to mark the end of the match:
'/<meta\s+(?=[^>]*name="keywords")[^>]*content="([^"]*)"[^>]*>/i'
And if you want to allow optional whitespace:
'/<meta\s+(?=[^>]*name\s*=\s*"keywords")[^>]*content\s*=\s*"([^"]*)"[^>]*>/i'
I would emphasize that your problem is not one of excess greediness. This regex works without the U
flag and with nothing but normal, greedy quantifiers.