0

Context: I have some dynamically generated HTML which can have embedded javascript function calls inside. I'm trying to extract the function calls with a regular expression.

Sample HTML string:

 <dynamic html>

   <script language="javascript">
       funcA();
   </script>

 <a little more dynamic html>

   <script language="javascript">
       funcB();
   </script>

My goal is to extract the text "funcA();" and "funcB();" from the above snippet (either as a single string or an array with two elements would be fine). The regular expression I have so far is:
var regexp = /[\s\S]*<script .*>([\s\S]*)<\/script>[\s\S]*/gm;

Using html_str.replace(regexp, "$1") only returns "funcB();".

Now, this regexp works just fine when there is only ONE set of <script> tags in the HTML, but when there are multiple it only returns the LAST one when using the replace() method. Even removing the '/g' modifier matches only the last function call. I'm still a novice to regular expressions so I know I'm missing something fundamental here... Any help in pointing me in the right direction would be greatly appreciated. I've done a bit of research already but still haven't been able to get this issue resolved.

user3923442
  • 41
  • 1
  • 1
  • 3
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – progrenhard Aug 08 '14 at 20:07
  • Parsing HTML with regex is generally a bad idea, but this can work if you never have any `` in strings within your JS code... – dee-see Aug 08 '14 at 20:16

1 Answers1

5

Your wildcard matches are all greedy. This means they will not only match what you expect, but as much as there possibly is in your code.

Make them all non-greedy (.*?) and it should work.

Julian
  • 757
  • 1
  • 6
  • 21