4

I would like to split a string in the following manner:

String s = "dotimes [sum 1 2] [dotimes [sum 1 2] [sum 1 3]]"

Outcome:

{"dotimes", "[sum 1 2]", "[dotimes [sum 1 2] [sum 1 3]]" 

I tried using this regex:

s.split("\\s(?=\\[)|(?<=\\])\\s")

But that results in the following:

dotimes

[sum 1 2]

[dotimes

[sum 1 2]

[sum 1 3]]

Is there any way to split the string in the way I want using regex?

Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
Bec
  • 133
  • 4
  • 5
  • 13
  • Can the brackets be arbitrarily nested? – arshajii Oct 11 '13 at 14:14
  • Check [this blog post](http://rjcodeblog.wordpress.com/2013/09/05/regex-to-split-a-string-on-comma-outside-double-quotes/). The idea is to split on whitespace, that is followed by balanced opening and closing brackets. – Rohit Jain Oct 11 '13 at 14:14
  • @RohitJain: Unfortunately that trick won't work here since OP has nested square brackets. – anubhava Oct 11 '13 at 14:16
  • @anubhava Oh! right, missed that one. – Rohit Jain Oct 11 '13 at 14:17
  • @Bec maybe give an example of what regex output/match should be, it isnt clear to me what you expect – gwillie Oct 11 '13 at 14:28
  • @gwillie Essentially I'd like to split the string by its outer brackets while ignoring the inner brackets. So I'd like "[sum 10 10] [sum 3 3]" to become {[sum 10 10], [sum 3 3]}. But this becomes a lot tricker when there are nested brackets involved. – Bec Oct 11 '13 at 14:32
  • @Bec Perhaps you would be better with writing your own parser for this. That would be much easier to tweak with, than relying on regex. Regex is not good at parsing nested structures. – Rohit Jain Oct 11 '13 at 14:34
  • If you think about it, any calculator application in java that supports nested parens would be able to do this. You could reverse engineer something of that nature and change parens to brackets and ignore the missing operands. – Engineer2021 Oct 11 '13 at 14:53
  • uhmm now i see...nested structures with regex...[check this](http://stackoverflow.com/questions/18703187/regexp-count-brackets). i haven't dealt with nested structures for a long time but if you need accuracy then you'll need to count opening/closing brackets at some point, regex is rather inflexible when it comes to infinity, if you know what i mean – gwillie Oct 11 '13 at 15:28

2 Answers2

0

Is there any way to split the string in the way I want using regex?

No, there is not. A regex (if it matches) returns the string and sub-strings that you surround by (), or a list of all complete matches if you use the global flag. You don't get a nested list of items that are the children of other matches.

Combining this with just Java would do the trick. I don't know Java, but I'll try to explain with this java-like code:

Array match_children (Array input) {
    Array output;

    foreach (match in input) {
        // The most important part!
        // The string starts with "[", so it is the beginning of a new nest
        if (match.startsWith("[")) {
            // Use the same ragex as below
            Array parents = string.match(matches 'dotimes' and all between '[' and ']');

            // Now, call this same function again with the 
            match = match_children(parents);
            // This stores an array in `match`
        }

        // Store match in output list
        output.push(match);

    }

    return output;
}

String string = "dotimes [sum 1 2] [dotimes [sum 1 2] [sum 1 3]]";
// "dotimes [sum 1 2] [dotimes [sum 1 2] [sum 1 3]]"

Array parents = string.match(matches 'dotimes' and all between '[' and ']');
// "dotimes", "[sum 1 2]", "[dotimes [sum 1 2] [sum 1 3]]"
// Make sure to use a global flag

Array result = match_children(Array input);
// dotimes
// [
//      sum 1 2
// ]
// [
//  dotimes
//  [
//      sum 1 2
//  ]
//  [
//      sum 1 3
//  ]
// ]

Again, I don't know Java and if it needs more clearification, just comment. :) Hope this helps.

Broxzier
  • 2,909
  • 17
  • 36
0

This works, although not particularly pretty, and in the absence of a formal grammar from the OP may perform poorly in generalisation.

{
    //String s = "sum 1 2";
    String s = "dotimes [sum 1 2] [dotimes [sum 1 2] [sum 1 3]]";
    int depth = 0;
    int pos = 0;        
    for (int c = 0; c <= s.length(); ++c){
        switch (c == s.length() ? ' ' : s.charAt(c)){
        case '[':
            if (++depth == 1){
                pos = c;
            }
            break;
        case ' ':
            if (depth == 0){
                String token = s.substring(pos, c == s.length() ? c : c + 1);
                if (!token.matches("\\s*")){ /*ingore white space*/
                    System.out.println(token);
                }                            
                pos = c + 1;
            }
            break;
        case ']':
            if (--depth == 0){
                String token = s.substring(pos, c + 1);
                if (!token.matches("\\s*")){ /*ingore white space*/
                    System.out.println(token);
                }                                                        
                pos = c + 1;
            }
        break;
        }
    }        
}

It writes the split string to standard output; add to your favourite container as you please.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • Thanks Bathsheba! Is there a reason the string "sum 1 2" returns just a list with "sum" instead of {"sum", "1", "2"}? – Bec Oct 11 '13 at 15:14
  • @Bec: yes; that's how I interpreted the question. [sum 1 2] would not separate it. – Bathsheba Oct 11 '13 at 15:32
  • Sorry, I mean that the 1 and 2 seem to get lost in the code? If I run the code on "sum 1 2" it returns just a list containing "sum." – Bec Oct 11 '13 at 15:34
  • I've amended. Getting a little messy now though. – Bathsheba Oct 11 '13 at 15:51