I want, given a webpage to extract every occurance of delimited string. I use regex to achieve that, like this
Regex Rx = new Regex(before + "(.*?)" + after);
if (o is string)
{
string s = o as string;
List<string> results = new List<string>();
foreach (Match match in Rx.Matches(s))
{
results.Add(match.ToString().Replace(before, "").Replace(after, ""));
}
return results.ToArray();
}
My input is html string containing this text
<script type="text/javascript">
var s1 = new SWFObject("http://hornbunny.com/player.swf","ply","610","480","10","#000000");
s1.addParam("allowfullscreen","true");
s1.addParam("allowscriptaccess","always");
s1.addParam("flashvars","overlay=http://cdn1.image.somesite.com/thumbs/0/9/e/1/2/09e12f7aeec382bc63a620622ff535b6/09e12f7aeec382bc63a620622ff535b6.flv-3b.jpg&settings=http://somesite.com/playerConfig.php?09e12f7aeec382bc63a620622ff535b6.flv|0");
s1.write("myAlternativeContent");
</script>
The result I get is string[] with 0 elements because foreach (Match match in Rx.Matches(s))
loops 0 times.
But it maches exactly 0 times, though there is at least 1 occurance in my document.
I tried to extract the strings between var s1 = new SWFObject
and </script>
as delimiters, so there are no special chars, even that I didn't escaped my strings.
What seems to be wrong with that regex?
Working:
if (o is string)
{
string s = o as string;
List<string> results = new List<string>();
foreach (Match match in Rx.Matches(s))
{
results.Add(match.Groups[1].Value);
}
return results.ToArray();
}