I am having trouble removing all javascript from a HTML page with C#. I have three regex expressions that remove a lot but miss a lot too. Parsing the javascript with the MSHTML DOM parser causes the javascript to actually run, which is what I am trying to avoid by using the regex.
"<script.*/>"
"<script[^>]*>.*</script>"
"<script.*?>[\\s\\S]*?</.*?script>"
Does anyone know what I am missing that is causing these three regex expressions to miss blocks of JavaScript?
An example of what I am trying to remove:
<script src="do_files/page.js" type="text/javascript"></script>
<script src="do_files/page.js" type="text/javascript" />
<script type="text/javascript">
<!--
var Time=new Application('Time')
//-->
</script>
<script type="text/javascript">
if(window['com.actions']) {
window['com.actions'].approvalStatement = "",
window['com.actions'].hasApprovalStatement = false
}
</script>