Regex for delimited string not working C#

Question

I want, given a webpage to extract every occurance of delimited string. I use regex to achieve that, like this

Regex Rx = new Regex(before + "(.*?)" + after);

if (o is string)
{
    string s = o as string;
    List<string> results = new List<string>();
    foreach (Match match in Rx.Matches(s))
    {
        results.Add(match.ToString().Replace(before, "").Replace(after, ""));
    }
    return results.ToArray();
}

My input is html string containing this text

<script type="text/javascript">
            var s1 = new SWFObject("http://hornbunny.com/player.swf","ply","610","480","10","#000000");
            s1.addParam("allowfullscreen","true");
            s1.addParam("allowscriptaccess","always");
                            s1.addParam("flashvars","overlay=http://cdn1.image.somesite.com/thumbs/0/9/e/1/2/09e12f7aeec382bc63a620622ff535b6/09e12f7aeec382bc63a620622ff535b6.flv-3b.jpg&settings=http://somesite.com/playerConfig.php?09e12f7aeec382bc63a620622ff535b6.flv|0");
            s1.write("myAlternativeContent");
        </script>

The result I get is string[] with 0 elements because foreach (Match match in Rx.Matches(s)) loops 0 times.

But it maches exactly 0 times, though there is at least 1 occurance in my document. I tried to extract the strings between var s1 = new SWFObject and </script> as delimiters, so there are no special chars, even that I didn't escaped my strings.

What seems to be wrong with that regex?

Working:

 if (o is string)
            {
                string s = o as string;
                List<string> results = new List<string>();
                foreach (Match match in Rx.Matches(s))
                {
                    results.Add(match.Groups[1].Value);
                }
                return results.ToArray();
            }

you need to give your content in `before` and `after` with an example..`.*?` is lazy in nature..it will consume as less as possible before terminating..so it is happy to match 0 character — rock321987, Jun 18 '16 at 13:58
Ok so withowt the question mark? But I want the regex to be lazy so it gets the smallest possible matches, I don't want to find my delimiters in the matched — JDE, Jun 18 '16 at 14:08
Use `RegexOptions.Singleline` if your string contains newline characters. — Lucas Trzesniewski, Jun 18 '16 at 14:10
also from what I am seeing you can use lookahead and lookbehind instead of using replace — rock321987, Jun 18 '16 at 14:16
Don't forget that your regex is case sensitive by default. If the delimiters in your regex are in a different case than the search text you won't find a match. You can make the regex case insensitive by setting the options on the Regex object. Also, since you have a capture group in your regex you can simplify your results.Add line to this: `results.Add(match.Groups[1].Value);` — Francis Gagnon, Jun 18 '16 at 14:18
now comes the classical question.. **[`Don't parse HTML with regex`](http://stackoverflow.com/a/1732454/1996394)** — rock321987, Jun 18 '16 at 14:21

Nikola Sivkov · Accepted Answer · 2016-06-18T14:24:55.400

0

The .*? matches any character except newline without the RegexOptions.Singleline option. So, unless it's all on one line it won't match newline separators.

So we arrive at ((.|\s)*) = match any character or newline between 0 and unlimited times. OR if we use RegexOptions.Singleline we can reduce the regex to (.*)

Edit: Working example.

var before = "var s1 = new SWFObject";
var after = "</script>";
var о = @"var s1 = new SWFObject(d
aw
da
wd
awd
aw
d
aw
d
awd
        </script> ";
Regex Rx = new Regex(before + "(.*)" + after,RegexOptions.Singleline);


if (о is string)
{
    string s = о as string;
    List<string> results = new List<string>();
    foreach (Match match in Rx.Matches(s))
    {
        results.Add(match.Groups[1].Value);
    }
      results.ToArray().Dump();
}

edited Jun 18 '16 at 14:24

answered Jun 18 '16 at 14:01

Nikola Sivkov

2,812
3
37
63

`/` is not a regex metacharacter, there's no need to escape it. – Lucas Trzesniewski Jun 18 '16 at 14:04
That, plus I fixed that with Regex.Escape(), still don't work ofcourse – JDE Jun 18 '16 at 14:10
i believe you are right. however in PHP regex `/` is a special character. I know this is .net regex. so yeah.... it this case it's no use, i guess i'm just used to escaping `/` ref: http://php.net/manual/en/regexp.reference.delimiters.php – Nikola Sivkov Jun 18 '16 at 14:11
1

In PHP, you can choose the delimiter you want. If you choose anything else than `/` (say `#` which is a common choice), then you won't need to escape `/`. – Lucas Trzesniewski Jun 18 '16 at 14:12
2

why not use `(?s)`? – rock321987 Jun 18 '16 at 14:20
@rock321987 updated my answer, thanks! – Nikola Sivkov Jun 18 '16 at 14:25
Still matching nothing – JDE Jun 18 '16 at 14:33
Finally, this works, but the lazy one with the question mark (.*?) – JDE Jun 18 '16 at 14:47

Regex for delimited string not working C#

1 Answers1