I have the following text:
<!--:en-->
<!--:-->
I want to construct a pattern to extract it from a string (PHP). I try with:
<!--:[a-z]{2}-->( \r\n\s)<!--:-->
But it does not work, does anybody know why or could help me?
I have the following text:
<!--:en-->
<!--:-->
I want to construct a pattern to extract it from a string (PHP). I try with:
<!--:[a-z]{2}-->( \r\n\s)<!--:-->
But it does not work, does anybody know why or could help me?
You probably don't want to use regex to parse XML/HTML.
And that for a lot of reasons.
Instead usually you would prefer to parse with tools made for this specific task.
Anyway, what you need here is more something like:
( |\s)*
You need to escape special characters, such as hyphen. Try this:
/<\!\-{2}\:[a-z]{2}\-\->(( |\s)*)<\!\-{2}\:\-{2}>/
If I correctly understood your question, you have to match the entire text, comments included.
So, strictly about your specific problem, I would use something like that:
$s = "<!--:en-->
<!--:-->";
$a = array();
preg_match('/<!--:[a-z]{2}--> \\s+<!--:-->/', $s, $a);
for ($i = 0; $i < count($a); $i++) {
var_dump(htmlentities($a[$i]));
}
Generally, I do not question if you should parse HTML with regular expressions or not, but notice, though, that Colin is right when he says that realistically parsing HTML with regular expressions can be outstandingly hard (read "nearly impossible"), as the posts he indicated state.