3

I would really appreciate some help with Java code to split the following inputs:

word1 key="value with space" word3 -> [ "word1", "key=\"value with space\"", "word3" ]
word1 "word2 with space" word3 -> [ "word1", "word2 with space", "word3" ]
word1 word2 word3 -> [ "word1" , "word2", "word3" ]

The first sample input is the tough one. The second word has quotes in the middle of the string not at the beginning. I found several ways of dealing with the middle example such as described in Split string on spaces in Java, except if between quotes (i.e. treat \"hello world\" as one token)

Community
  • 1
  • 1
Mike Cooper
  • 1,065
  • 3
  • 13
  • 34

3 Answers3

1

Rather than using regex at all, you can do a simple iteration over the string:

public static String[] splitWords(String str) {
        List<String> array = new ArrayList<>(); 
        boolean inQuote = false; // Marker telling us if we are between quotes
        int previousStart = -1;  // The index of the beginning of the last word
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            if (Character.isWhitespace(c)) {
                if (previousStart != -1 && !inQuote) {
                    // end of word
                    array.add(str.substring(previousStart, i));
                    previousStart = -1;
                }
            } else {
                // possibly new word
                if (previousStart == -1) previousStart = i;
                // toggle state of quote
                if (c == '"')
                    inQuote = !inQuote;
            }
        }
        // Add last segment if there is one
        if (previousStart != -1) 
            array.add(str.substring(previousStart));
        return array.toArray(new String [array.size()]);
    }

This method has the advantage of being able to correctly identify quotes that are nowhere near spaces as many times as necessary. For example, the following is a single string:

a"b c"d"e f"g
Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
0

This can be done with a mix of regex and replace. Just find text surrounded by quotes first and replace with non-spaces. Then you can split the string based on spaces and replace back the key text.

    String s1 = "word1 key=\"value with space\" word3";

    List<String> list = new ArrayList<String>();
    Matcher m = Pattern.compile("\"([^\"]*)\"").matcher(s1);
    while (m.find())
        s1 = s1.replace(m.group(1), m.group(1).replace(" ", "||")); // replaces the spaces between quotes with ||

    for(String s : s1.split(" ")) {
        list.add(s.replace("||", " ")); // switch back the text to a space.
        System.out.println(s.replace("||", " ")); // just to see output
    }
Tah
  • 1,526
  • 14
  • 22
0

The split can be done by using a look ahead in the regex:

String[] words = input.split(" +(?=(([^\"]*\"){2})*[^\"]*$)");

Here's some test code:

String[] inputs = { "word1 key=\"value with space\" word3","word1 \"word2 with space\" word3", "word1 word2 word3"};
for (String input : inputs) {
    String[] words = input.split(" +(?=(([^\"]*\"){2})*[^\"]*$)");
    System.out.println(Arrays.toString(words));
}

Ouput:

[word1, key="value with space", word3]
[word1, "word2 with space", word3]
[word1, word2, word3]
Bohemian
  • 412,405
  • 93
  • 575
  • 722