-1

I am returning html contents (page layout) in a variable but want to remove <script>blabla</script> tags and contents within these tags.

How can I do this?

Justin
  • 84,773
  • 49
  • 224
  • 367
SPBeginer
  • 733
  • 3
  • 14
  • 29
  • 3
    http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – flq Sep 27 '11 at 12:31
  • 1
    It would appear that there is a view in the community that this is something which should not be attempted. – David Sep 27 '11 at 12:42
  • @flq The question doesn't ask "how do I do this using regex?", you could have just re-tagged the question and answered it. – Justin Sep 27 '11 at 12:46
  • Correct Dave, using regex for something like this may give unreliable results. – keithwill Sep 27 '11 at 12:48
  • @justin it did, but it was edited away from me... – flq Sep 27 '11 at 12:56

1 Answers1

2

You really need to parse the HTML.

Try using the Html Agility Pack which should make this pretty straightforward, for example:

HtmlDocument doc = new HtmlDocument();
doc.Load("HTMLPage1.htm");
foreach (var node in doc.DocumentNode.SelectNodes("//script"))
{
    node.Remove();
}
Justin
  • 84,773
  • 49
  • 224
  • 367
  • By the way, parsing Html in general is a rather dicey proposition. If you are working with a legacy web application (or a buggy one), then you might not get Html back in a form that Html Agility Pack can parse properly. – keithwill Sep 27 '11 at 12:50