1

Think of a string like this:

public function test(data)
{
    if (1 == 2)
    {
        alert("Wtf?");
    }

    switch (data)
    {
        case 'test':
            alert("Test");
        break;
    }
}

I need to parse that string, so that I have the body (content) of that function.

I already got my reg exp working so that I have the contents of the function, but when there is an ending }. The regular expression stops and the content will look like this:

if (1 == 2)
{
   alert("Wtf?");

I really hope somebody can help me out..

this is my Reg Exp for splitting this string:

var test = classContent.replace(/(?:(private|public)\s*)function\s*([a-zA-Z0-9_]+)\s*\(([a-zA-Z0-9_\,\s]*)\s*\)\s*{([^}]+)\}/gi, function(a, b, c, d, e) {

    classMethods[c] = {
        visibility : b.trim(),
        params : d.trim(),
        content : e.trim()
    };
});
Steffen Brem
  • 1,738
  • 18
  • 29
  • 6
    Strictly speaking, you cannot parse a grammar like that with regular expressions. You can do it if you impose certain constraints on the text to be matched, but if you really want to match arbitrary JavaScript constructs you need a full-blown JavaScript parser. – Pointy Oct 27 '12 at 15:10
  • 2
    Additional info on what @Pointy said http://en.wikipedia.org/wiki/Chomsky_hierarchy – Prinzhorn Oct 27 '12 at 15:11
  • 2
    You only have a chance with a regex if the final closing `}` is always at the same indentation level as the opening `{`, and if there is no intervening brace at the same level. Can you guarantee that? – Tim Pietzcker Oct 27 '12 at 15:16
  • 3
    This is impossible to do reliably with regular expressions (it would be at least *virtually* impossible -- and quite probably actually impossible -- even without edge cases like curly braces in single-line comments, in multi-line comments, in quotes, etc., etc.). On the other hand, the problem as described is *trivial* to do *without* regular expressions, just by spinning through the string yourself and keeping track of the number of `{` and `}` encountered (that aren't within comments or quotes). – T.J. Crowder Oct 27 '12 at 15:18
  • When answering @Tim, keep in mind the possibility of someone commenting out the end of a function with a multi-line comment and then coding *new* end of a function, and leaving the commented-out version in place. E.g., the answer is almost certainly "no, I can't guarantee that." :-) – T.J. Crowder Oct 27 '12 at 15:21
  • I think keeping track of the curly braces is the best way to go. I will give it a try and and let you guys know if it worked! :) – Steffen Brem Oct 27 '12 at 15:53

2 Answers2

2

This is generally too tough a problem for regular expressions. They cannot really handle nested structures well. Some flavors support recursive patterns, but even this would be overkill in this case. A quick fix to your given problem would be this:

/(?:(private|public)\s*)function\s*([a-zA-Z0-9_]+)\s*\(([a-zA-Z0-9_\,\s]*)\s*\)\s*{(.+)\}/gis

This allows for any characters between curly brackets (including curly brackets), and since the + is greedy, this will go all the way to the end.

However, if your string can contain multiple functions, this will get you everthing from the first function name to the very last closing }. And I have a feeling that this is the case for you, because you used the global modifier g.

If this is this case (or anyway), consider using a different approach (that is, a JavaScript parser or analyzing the string yourself and counting curly brackets). Maybe this question will help you there.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
0

Javascript does not provide the PCRE recursive parameter (?R).

Check out Steve Levithan's blog, he wrote XRegExp, which replaces most of the PCRE bits that are missing. There is also a Match Recursive plugin.

Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • And yet, I suspect even using a full PCRE, doing this handling all the edge cases will be difficult or impossible (see comment on the question). – T.J. Crowder Oct 27 '12 at 15:30
  • @T.J.Crowder - My answer is the **best possible regular expression solution**, which certainly is not better than using parser; however as OP asks for regular expression, this is it... – Ωmega Oct 27 '12 at 15:33
  • @ Ωmega: Sometimes you have to change the question. :-) – T.J. Crowder Oct 27 '12 at 15:35