1

I want split a phrase on spaces, and not spaces within a quoted string (i.e., a string within pair of double quotation marks ").

For example:

software term "on the fly" and "synchrony"

Should be split into these 5 segments:

software  
term   
on the fly  
and  
synchrony

So how could I implement this in java?

jww
  • 97,681
  • 90
  • 411
  • 885
Matt
  • 115
  • 1
  • 11
  • You can't rewrite the rules of Java itself... Some languages are flexible enough to do that, but not Java. – Willem Van Onsem Aug 25 '14 at 00:36
  • 3
    I think the question is just how to write a function that will convert the string `"software term \"on the fly\" and \"synchron\""` to the list `["software", "term", "on the fly", "and", "synchron"]`. – Chris Martin Aug 25 '14 at 00:47
  • Yes,this result is I expected.anyone knows how to implement this with Java API. – Matt Aug 25 '14 at 00:56
  • Note that the duplicate is the same basic problem, but the duplicate splits on comma and quotes on `[` and `]` - just replace the comma in that regex for a space, and replace the `\\[` with `\"` – Bohemian Aug 25 '14 at 00:58
  • 1
    @Bohemian It does not seem like so. Have you tested those changes you suggest? – acdcjunior Aug 25 '14 at 01:02
  • In case this never gets reopened, here's my answer. https://gist.github.com/chris-martin/097f3bfd966c915ac0b0 – Chris Martin Aug 25 '14 at 01:12
  • @acdcjunior replace the comma, remove the `(?=([^\\[]*?\\[[^\\]]*\\][^\\[\\]]*?)*$)` part and you'll be fine. @ChrisMartin your suggestion [doesn't work with multiple spaces](http://ideone.com/ZRCXLs). And here's [my suggestion](http://ideone.com/2GlriL) – Volune Aug 25 '14 at 01:21
  • @acdcjunior Yes - I just realised the other difference: The duplicate is the same basic problem, but the duplicate splits on comma **and quotes using `[` and `]`** - just replace the comma in that regex for a space, **and replace `\\[` and `\\]` with `\"`**. – Bohemian Aug 25 '14 at 01:50
  • possible duplicate of [Regex pattern for split](http://stackoverflow.com/questions/17985909/regex-pattern-for-split) –  Aug 25 '14 at 02:02
  • @JarrodRoberson already tried that as a dup, but wasn't clear to OP and others and there were several reopen requests. Also, that's a more complicated situation. I have posted a new solution as an answer here. – Bohemian Aug 25 '14 at 02:16

3 Answers3

8

This regex achieves the split for you, and cleans up any delimiting quotes:

String[] terms = input.split("\"?( |$)(?=(([^\"]*\"){2})*[^\"]*$)\"?");

It works by splitting on a space, but only if it is followed by an even number of quotes.
The quotes themselves are consumed, so they don't end up in the output, by including them optionally in the split term.
The term ( |$) was needed to capture the trailing quote.

Note that if the first term could be quoted, you'll need to clean up that leading quote first:

String[] terms = input.replaceAll("^\"", "").split("\"?( |$)(?=(([^\"]*\"){2})*[^\"]*$)\"?");

Test code:

String input = "software term \"on the fly\" and \"synchron\"";
String[] terms = input.split("\"?( |$)(?=(([^\"]*\"){2})*[^\"]*$)\"?");
System.out.println(Arrays.toString(terms));

Output:

[software, term, on the fly, and, synchron]
Bohemian
  • 412,405
  • 93
  • 575
  • 722
1

alternative to the previous post:

    boolean quoted = false;
    for(String q : str.split("\"")) {
        if(quoted)
            System.out.println(q.trim());
        else
            for(String s : q.split(" "))
                if(!s.trim().isEmpty())
                    System.out.println(s.trim());
        quoted = !quoted;
    }
Grk58
  • 11
  • 3
0
    String str = "software term \"on the fly\" and \"synchron\"";
    String[] arr = str.split("\""); // split on quote first
    List<String> res = new LinkedList<>();
    for(int i=0; i<arr.length; i++) {
        arr[i] = arr[i].trim();
        if ("".equals(arr[i])) {
            continue;
        }
        if (i % 2 == 0) {
            String[] tmp = arr[i].split("\\s+"); // second, split on spaces (when needed)
            for (String t : tmp) {
                res.add(t);
            }
        } else {
            res.add("\"" + arr[i] + "\""); // return the quote back to place
        }

    }
    System.out.println(res.toString());

OUTPUT

[software, term, "on the fly", and, "synchron"]
Nir Alfasi
  • 53,191
  • 11
  • 86
  • 129