I am having problems in getting all <script>
and its respective closing </script>
tags from a html text using via regular expressions, in C#.
I created a sample html that looks like:
<html>
<head>
<title>
</title>
<script src="adasdsadsda.js"></script>
</head>
<body>
<script type='javascript'>
var a = 1 + 2;
alert('a');
</script>
</body>
<script></script>
</html>
The regular expression I am using is:
<script.*>[^>]*<\/script>
I often use regexr to validate/test my regular expressions (highly recommend it!). It shows the regular expression in question captures 3 occurrences (just as I expect).
But C#'s regex.Matches
is not capturing 3 instances, instead, a single one with all occurrences in it. Is this the expected behavior for the Matches
method ? I have been using it quite a lot and have been getting all occurrences as a separate capture.
Why is this happening in my case ?
P.S: In answering the question, if you want to point out that regex is not suited for parsing HTML, please explain how come regexr and .NET's Regex give different results ? Do they have different regex implementations ?