0

I have been trying to parse a command with regular expression in Java for a while but no success. The main issue I am having is that the delimiter is space and then I want to treat everything that is within a double quotes as an argument but what if one of these arg contains quotes within quotes. Here is the command and few examples:

my_command "regex or text" <"regex or text"|NA> <"text or regex"|NA> integer integer 

Example1: my_command "Simple case" NA NA 2 3 

Example2: my_command "This is it!" "[\",;']" "Really?" 3 5

Example3: my_command "Not so fast" NA "Another regex int the mix [\"a-zA-Z123]" 1 1

Basically parseCommand(String str) will take any of the above examples and return a List with the following values:

Example1: list[0] = "Simple Case", list[1] = NA, list[2] = NA, list[3] = "2", list[4] = "3"

Example2: list[0] = "This is it!", list[1] = "[\",;']", list[2] = NA, list[3] = "3", list[4] = "5"
Example3: list[0] = "Not so fast", list[1] = NA, list[2] = "Another regex int the mix [\"a-zA-Z123]" , list[3] = "1", list[4] = "1"

Thank you for your help in advance.

Rex
  • 91
  • 1
  • 7

1 Answers1

1

Trying to do this with a regex is a mistake - you are not parsing a regular expression.

Start with something like this - you will fail with a regex:

public void test() {
    System.out.println(parse("\"This is it!\" \"[\\\",;']\" \"Really?\" 3 5"));
}

List<String> parse(String s) {
    List<String> parsed = new ArrayList<String>();
    boolean inQuotes = false;
    boolean escape = false;
    int from = 0;
    for (int i = 0; i < s.length(); i++) {
        char ch = s.charAt(i);
        switch (ch) {
            case ' ':
                if (!inQuotes && !escape) {
                    parsed.add(s.substring(from, i));
                    from = i + 1;
                }
                break;
            case '\"':
                if (!escape) {
                    inQuotes = !inQuotes;
                }
                escape = false;
                break;
            case '\\':
                escape = !escape;
                break;
            default:
                escape = false;
                break;
        }
    }

    if (from < s.length()) {
        parsed.add(s.substring(from, s.length()));
    }
    return parsed;
}

Added

With the specific string in question, here is my interpretation:

String str = "my_command \"Something [\"abc']\" \"text\" NA 1 1";
//                         ............        ..       .......
//                        ^            ^      ^  ^     ^

I have used a ^ to indicate a quote and used . for all characters therefore in quotes. Thus no further splits after the first quote as there are no unquoted spaces after that.

Community
  • 1
  • 1
OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213
  • Thank you for the quick reply but that doesn't quite work. Tried it for the following: String str = "my_command \"Something [\"abc']\" \"text\" NA 1 1". – Rex Sep 17 '14 at 16:45
  • @user2182414 - I now get [my_command, "Something ["abc']" "text" NA 1 1] which kind of makes sense because all spaces are in quotes after the first one - what did you expect? PS: I tweaked the `\"` case a little. – OldCurmudgeon Sep 17 '14 at 18:20
  • I was expecting: list[0]=my_command, list[1]=Something ["abc], list[2]=text, list[3]=1, list[4]=1. With the above method I am getting list[0] = my_command list[1]= "Something ["abc']" "text" NA 1 1 – Rex Sep 17 '14 at 18:27
  • But the space after abc] is in quotes so should not be a break. – OldCurmudgeon Sep 17 '14 at 18:39
  • The space that are within quotes should not break. So for example "Hello number 2" would count as one argument/item and "Hello number \"2\"" would also be one argument. – Rex Sep 18 '14 at 22:12
  • Unfortunately no. If you try it with the example that I provided you only get two items in the list where you should get 4. You can try the following with the method: String str = "my_command \"Something [\"abc']\" \"text\" NA 1 1"; – Rex Sep 20 '14 at 01:00
  • @Rex - Please see my additions concerning your test string. – OldCurmudgeon Sep 20 '14 at 15:39
  • I really appreciate your help. However, the parser does not do what is needed. Think of it as linux grep command where you would pass arguments separated with space and some arguments are quoted and within quotes you can pass any character including space and also double quotes. – Rex Sep 23 '14 at 02:46