0

So I know this question may appear similar to other questions out there regarding regex and such. I believe mine is unique because I'm using java to parse some javascript, which can contain brackets within brackets for anonymous functions etc. Consider the following as an example:

describe('a jasmine describe', function (){
    it('login', function(){
        //some function stuff
    });

    it('another it statement', function() {
        //some additional stuff
    });
});

What I ultimately want is:

Group 1: "a jasmine describe"

Group 2: all of the content between open/close brackets of the describe

I believe I have the regex to get the Group 1 I'm looking for which is:

Pattern r = Pattern.compile("(?:describe\\s*\\(\\s*')(.*?)(?=')", Pattern.CASE_INSENSITIVE);

But I have no idea how to get the contents between the open/close of the specific describe bracket.

  • 1
    `function(){}` is not java. Are you really using javascript? Or are you using java to get information out of a javascript file? – Teepeemm Sep 18 '15 at 02:12
  • I'm using java to parse some javascript, clarified in question as well – user3645197 Sep 18 '15 at 02:20
  • 4
    If you have potential for infinite nested braces, you won't be able to do this conveniently with regex. – Mad Physicist Sep 18 '15 at 02:25
  • I won't have infinite nested braces for sure, nothing to the extent that I would need to be concerned with performance. – user3645197 Sep 18 '15 at 02:27
  • Does the target always follow the text "describe"? – Bohemian Sep 18 '15 at 02:31
  • Yes it does, describe is the parent of a collection of it see http://jasmine.github.io/edge/introduction.html – user3645197 Sep 18 '15 at 02:33
  • The regex lib in java does not allow any kind of recursion. Alternatives: **1.** Use a parser like the ones described in http://stackoverflow.com/questions/6511556/javascript-parser-for-java **2.** Use regex to split each branch (something like http://stackoverflow.com/a/31948674/5290909) and then recurse/loop in your code. **3.** Create a pattern to handle a finite number of nested constructs (crazy indeed) – Mariano Sep 18 '15 at 05:26

2 Answers2

1

Regex may not be best tool for that, but you can try withe regex:

^(?m)(?<indent>\s*)describe\('([^']+)'[^{]+\{([\s\S]+?)\n\k<indent>\}\);

DEMO

  • ^(?m) - beginning of a line, multiline (could be replaced with using Pattern.MULTILINE),
  • (?<indent>\s*) - capture indention befeore method,
  • describe\( - describe followed by opening of parathesis
  • '([^']+)' - matching text between single quotes, need to be modified if text could consist ',
  • [^{]+\{ - match text up to first {
  • ([\s\S]+?) - match anything, with reluctant quantifire
  • \n\k<indent>\}\); - new line, followed by captured indentation, followed by closing of method body,

which will capture 'a jasmine describe' in 2nd group, and the describe content into 3rd group, because of additional group indent(named 1st group), which should ensure, that regex will match content of {...}. The 1 group (<indent>) capture a indentation before the describe function in the code, and then use it as a boundary, where finish matching (on a } preceded by a proper indentation). This is kind of workaround for matching nested brackets, but the code need to be well formated.

Ofcoure, is Java code, you need to double \ backslashes.

m.cekiera
  • 5,365
  • 5
  • 21
  • 35
-1

This regex matches your target capturing groups 1 and 2 as required:

describe\('([^']*).*?function\s*\(\)\s*\{(([^{]*\{[^}]*\})*[^}]*)\}

This will handle any number of non-nested curly-bracketed input in the body of the function.

See live demo.

Bohemian
  • 412,405
  • 93
  • 575
  • 722