10

Possible Duplicate:
Split a String based on regex

I've never been a regular expression guru, so I need your help! I have a string like this:

String s = "a [b c] d [e f g]";

I want to split this string using spaces as delimiters -- but I don't want to split on spaces that appear within the [] brackets. So, from the example above, I would like this array:

{"a", "[b c]", "d", "[e f g]"}

Any advice on what regex could be used in conjunction with split in order to achieve this?


Here's another example:

"[a b] c [[d e] f g]"

becomes

{"[a b]", "c", "[[d e] f g]"}
Community
  • 1
  • 1
arshajii
  • 127,459
  • 24
  • 238
  • 287
  • 3
    [Lesson: Regular Expressions](http://docs.oracle.com/javase/tutorial/essential/regex/) – user1329572 Oct 14 '12 at 17:17
  • @artbristol Yes they can, I'd like no splitting to happen within any set of brackets. I edited to include another example. – arshajii Oct 14 '12 at 17:20
  • 2
    @A.R.S, then you can't do it with regular expressions. Time to write a parser. – Carl Norum Oct 14 '12 at 17:21
  • 2
    this is the third exact duplicate question..[this](http://stackoverflow.com/questions/12756651/split-a-string-based-on-regex/12756722#12756722) and [this](http://stackoverflow.com/questions/12305182/parse-commas-not-surrounded-by-brackets/12305883#12305883) – Anirudha Oct 14 '12 at 17:32
  • @CarlNorum you can...check out the above similar question i answered – Anirudha Oct 14 '12 at 18:08

5 Answers5

10

I think this should work, using negative lookahead - it matches no whitespace that comes before closing bracket without an opening bracket:

"a [b c] d [e f g]".split("\\s+(?![^\\[]*\\])");

For nested brackets you will need to write a parser, regexes can't afford an infinite level and get too complicated for more than one or two levels. My expression for example fails for

"[a b [c d] e] f g"
Bergi
  • 630,263
  • 148
  • 957
  • 1,375
3

You can not do that with single regex, simply because it can not match open/close braces and handle nested braces.

Regexes are not turing-complete, so even if it might look as working, there will be the case where it fails to.

So I'd rather suggest to program your own few lines of code which will definitely handle all cases.

You may create very simple grammar for JavaCC or AntLR or use simple stack-based parser.

jdevelop
  • 12,176
  • 10
  • 56
  • 112
3

As said in other answers you need a parser for that. Here a string that fail with previous regex solutions.

"[a b] c [a [d e] f g]"

EDIT:

public static List<String> split(String s){
    List<String> l = new LinkedList<String>();
    int depth=0;
    StringBuilder sb = new StringBuilder();
    for(int i=0; i<s.length(); i++){
        char c = s.charAt(i);
        if(c=='['){
            depth++;
        }else if(c==']'){
            depth--;
        }else if(c==' ' && depth==0){
            l.add(sb.toString());
            sb = new StringBuilder();
            continue;
        }
        sb.append(c);
    }
    l.add(sb.toString());

    return l;
}
Marco Martinelli
  • 892
  • 2
  • 14
  • 27
0

If I understood your question correctly then may be the answer is following rule4.

rule1 -> ((a-z).(\w))*.(a-z)

rule2 -> ([).rule1.(])

rule3 -> ([).(rule1.(\w))*.rule2.((\w).rule1)*.(])

rule4 -> rule1 | rule3
taufique
  • 2,701
  • 1
  • 26
  • 40
-1

FOR NON NESTED

\\s+(?![^\\[]*\\])

FOR NESTED([] inside [])

(?<!\\[[^\\]]*)\\s+(?![^\\[]*\\])
Anirudha
  • 32,393
  • 7
  • 68
  • 89