I have a regex which seems to stop working whatsoever if it is given the wrong input.
My code:
function dbStr($string)
{
private static $tag = "(script|embed)";//As it turns out, embeds have the exact same syntax as scripts, so, we can use the same regexes against those :)
private static $tvnc = "(\\\\'|\\\\\"|[^<>\"'/])*?";//Tag Valid No Close
private static $quoteseq = "['\"](\\\\'|\\\\\"|[^\"'])*?['\"]";
private static $tvncq = "(".$tvnc.$quoteseq.$tvnc.")*?";//Tag Valid No Close Quotes
$string = preg_replace_callback
(
"#<".$tvnc.$tag."(".$tvncq."(src=".$quoteseq.")".$tvncq.")/>#imsSX",//Pattern
"dbStr_FilterSinglematch",//Callback
$string//Subject
);
return $string;
}
function dbStr_FilterSinglematch($m)
{
print_r($m);
return "";
}
Now, lets's say I call this input:
echo "\n" . dbStr
("
<script type='textjavascript' src='asdf'/>
<script type='textjavascript' src='asdf'>
asdfasfasdf
uyoiyoiuyoiuy
");
It works fine! It finds a match, and removes that match. Here is the output that I am sent from that call:
Array
(
[0] => <script type='textjavascript' src='asdf'/>
[1] =>
[2] => script
[3] => type='textjavascript' src='asdf'
[4] => type='textjavascript'
[5] => =
[6] => t
[7] =>
[8] => src='asdf'
[9] => f
)
<script type='textjavascript' src='asdf'>
asdfasfasdf
uyoiyoiuyoiuy
However, if I give it THIS input instead....
echo "test" . dbStr
(
'
<embed type="application/x-shockwave-flash" src="http://picasaweb.google.com/s/c/bin/slideshow.swf" width="288" height="192" flashvars="host=picasaweb.google.com&hl=en_US&feat=flashalbum&RGB=0x000000&feed=http%3A%2F%2Fpicasaweb.google.com%2Fdata%2Ffeed%2Fapi%2Fuser%2F109941697484668010012%2Falbumid%2F5561383933745906193%3Falt%3Drss%26kind%3Dphoto%26authkey%3DGv1sRgCN2H88H41qeT6AE%26hl%3Den_US" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed>
'.
"
<script type='textjavascript' src='asdf'/>
<script fubar=\"d\\\\\'erp\" derp=\"dlerp\">
//<script type='text/javascript' src='asdf'/>
asdfasfasdf
</script>
<script>
uyoiyoiuyoiuy
</script>
");
Nothing. Nothing at all. No matches are found, but the text I get out of the regex is completely blank!
I mean, seriously.... What the heck? THis is the output I get from running the above code:
test
Yes, that's it.
If the regex had found any matches (like, say, matched the entire document for instance) then wouldn't it have outputted something form my print_r() call? No, I don't think it's even calling the callback. The regex is failing altogether.
Whats worse is, I have the following headers/ini settings set:
header('Content-type: text/plain');
error_reporting(E_ALL);
ini_set("display_errors", 1);
But I am not seeing ANY errors either in my log OR in the output its self!
So, there you have it, my regex predicament. Does anyone have any ideas as to why this is failing?
EDIT:
I have narrowed down the source of my problems:
echo "test " . dbStr
('<embed tests="abc" tests="abc" flashvars="AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"></embed>');
It seems when I have two attributes about that long, and then a very long attribute, the system crashes. However, THIS input does not crash...: (there are more A's but no preceding tags)
echo "test " . dbStr
('<embed
flashvarsembed>');
That being stated, with the added A's the preceding tags now only need to be THIS long to crash it:
echo "test " . dbStr
('<embed a="b" c="d"
flashvarsembed>');
It seems that this is a memory related issue... Is there a fix? The code this will be parsing could be extremely long.