I expected to find this in SO already... but haven't so far
I'm talking about a regex which looks at an HTML ENCODED string: e.g. something like
blip ♦ trout’s mouth
Have I covered all the bases with &\w+;
and &#[0-9]+;
?
$encoded_string = htmlspecialchars($_GET["searchterms"]);
echo "<b>Search results for submitted string: \"$encoded_string\"</b><br><br>";
$html_special_chars_pattern = "!(&\\w+;|&#[0-9]+;)!";
$non_html_tokens = preg_split( $html_special_chars_pattern, $encoded_string, -1, PREG_SPLIT_DELIM_CAPTURE );