2

I use webBrowser.DocumentText to get the html code of a page. using Regex, i manage to get the script tag part.
< script type="text/javascript">functions here..< /script>

I need to get functions inside those tags. ex.

<script type="text/javascript">
 function function1 () { code here;}
 function function2 () { code here;} 
<br>
</script>

I need regex pattern to get the 2 functions
or list them down like this
1. function funtion1() { code here; }
2. function funtion2() { code here; }

purpose of the program is to identify if there's a duplicate javascript functions between 2 pages.
Its for winForms and language is C#

Jepe d Hepe
  • 899
  • 5
  • 22
  • 42
  • 1
    ... these – Richard JP Le Guen Jan 29 '10 at 03:28
  • You need to be a bit more specific, you want the entire function, the function names? Also, your example...those aren't even valid javascript functions, they have no names... – Nick Craver Jan 29 '10 at 03:29

2 Answers2

1

You can not do it in any general way with regexes alone (especially not with the .NET flavour), since JavaScript scopes can be nested arbitrarily deeply and the language is therefore irregular. If you need them for a few particular pages, you might be able to craft a regex that handles common cases, but not all.

Max Shawabkeh
  • 37,799
  • 10
  • 82
  • 91
0
e = ".*?(function.+?{.*?}|\\z)";
repl = "\\1";

I believe that's it.

Mark
  • 79
  • 5
  • but is it possible for nested "}"? – Jepe d Hepe Jan 29 '10 at 03:48
  • In Javascript, yes. I was assuming the standard syntax. – Mark Jan 29 '10 at 03:53
  • But you can't have nested – Mark Jan 29 '10 at 03:57
  • e = ".*?((.*?)|\\z)"; repl = "\\2"; But that will leave in any var statements so you could make a second pass to remove anything starting with var and ending with \n. Also I didn't account for spaces in between the starting '<' and script (etc.) for readability – Mark Jan 29 '10 at 04:19