3

How can I parse a String str = "abc, \"def,ghi\"";

such that I get the output as

String[] strs = {"abc", "\"def,ghi\""}

i.e. an array of length 2.

Should I use regular expression or Is there any method in java api or anyother opensource

project which let me do this?

Edited

To give context about the problem, I am reading a text file which has a list of records one on each line. Each record has list of fields separated by delimiter(comma or semi-colon). Now I have a requirement where I have to support text qualifier some thing excel or open office supports. Suppose I have record

abc, "def,ghi"

In this , is my delimiter and " is my text qualifier such that when I parse this string I should get two fields abc and def,ghi not {abc,def,ghi}

Hope this clears my requirement.

Thanks

Shekhar

David Hedlund
  • 128,221
  • 31
  • 203
  • 222
Shekhar
  • 5,771
  • 10
  • 42
  • 48
  • 1
    the edit by @Burkhard actually changes the requirements of the expected outcome. do you want `{"abc", "def,ghi"}` or `{"abc", "\"def,ghi\""} ` ? – David Hedlund Jul 05 '10 at 08:32
  • @David: actually, I just changed abc to be "abc", i.e. a String. Maybe I should also have changed "def,ghi" to "\"def,ghi\""? – Burkhard Jul 05 '10 at 08:36
  • 1
    @Burkhard: yeah, that was exactly my point. when the first string weren't quoted and the second string was, we could still assume that the string quotations were consistently left out, and that all that was shown was the *values*. now we can't really assume anything =) – David Hedlund Jul 05 '10 at 08:38
  • I want the {"abc", "\"def,ghi\""} – Shekhar Jul 05 '10 at 08:38
  • what should be the result in the command line of System.out.println(strArray[1])? – Matt Mitchell Jul 05 '10 at 08:43

4 Answers4

5

The basic algorithm is not too complicated:

 public static List<String> customSplit(String input) {
   List<String> elements = new ArrayList<String>();       
   StringBuilder elementBuilder = new StringBuilder();

   boolean isQuoted = false;
   for (char c : input.toCharArray()) {
     if (c == '\"') {
        isQuoted = !isQuoted;
        // continue;        // changed according to the OP comment - \" shall not be skipped
     }
     if (c == ',' && !isQuoted) {
        elements.add(elementBuilder.toString().trim());
        elementBuilder = new StringBuilder();
        continue;
     }
     elementBuilder.append(c); 
   }
   elements.add(elementBuilder.toString().trim()); 
   return elements;
}
Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • Would that handle nested escaped quotes? – Matt Mitchell Jul 05 '10 at 08:41
  • 1
    that's really neat! i probably would have come up with something way more complicated for this :D – David Hedlund Jul 05 '10 at 08:42
  • 1
    Not yet, but (1) I haven't seen such a requirement and (2) - it's a basic algorithm. You can easily add a 'nested quote' detection and change the 'isQuoted' test. – Andreas Dolk Jul 05 '10 at 08:44
  • @Graphain: there is no start-quote and end-quote, so you really can never tell whether four quotes are two quoted strings after one another, or one quoted string nested in another. the *world* doesn't support nested escaped quotes the way it does for, say `(`, `)`, where there are different signs for start and stop... unless i misunderstood your question...? – David Hedlund Jul 05 '10 at 08:44
  • @David - I mean does it support this: String a = "\"bbb\\\"\"\"ccc"; I have no idea what the OP would want displayed there but I was just pointing out that this kind of problem is almost always better solved by using an existing API. Having said that I'm *very* impressed by the succinctness of this approach. – Matt Mitchell Jul 05 '10 at 08:47
  • 1
    @David - one could introduce a grammar like `"one, \"two, \\\"three\\\"\""` to alow nested quotes, but this was not a requirement (yet) – Andreas Dolk Jul 05 '10 at 08:49
  • @Andreas_D: yeah, i guess that's true. something entirely different caught my eye, tho: wouldn't you need to do a second `elements.add` before you return it, to add what's currently in the builder, assuming the string doesn't end with a comma? – David Hedlund Jul 05 '10 at 08:55
  • @David - thanks!! Sure, the last add was missing - fixed the code. (will append an empty string if the input ends with ", " or so. – Andreas Dolk Jul 05 '10 at 08:59
2

This question seems appropriate: Split a string ignoring quoted sections

Along that line, http://opencsv.sourceforge.net/ seems appropriate.

Community
  • 1
  • 1
Matt Mitchell
  • 40,943
  • 35
  • 118
  • 185
  • 2
    i think the fact that the second string has no space in it is only incidental, and not really central to the question – David Hedlund Jul 05 '10 at 08:28
  • Will work with this example but fail on `"abc, \"def, ghi\""` (just my guess, that this is a possible valid input too) – Andreas Dolk Jul 05 '10 at 08:29
  • better! now none of our comments apply, because this is a different answer entirely. i would rather have seen the old answer as it were, deleted, and this posted as a new one. but that's just details. +1 for the answer – David Hedlund Jul 05 '10 at 08:34
  • @David Hedlund - yeah you're probably right but no matter now. – Matt Mitchell Jul 05 '10 at 08:43
0

Try this -

 String str = "abc, \"def,ghi\"";
            String regex = "([,]) | (^[\"\\w*,\\w*\"])";
            for(String s : str.split(regex)){
                System.out.println(s);
            }
user381878
  • 1,543
  • 5
  • 17
  • 30
  • It will not work for String str = "abc, \"def,ghi\",jkl"; The expected output will be {abc,"def,ghi",jkl} – Shekhar Jul 05 '10 at 09:04
0

Try:

List<String> res = new LinkedList<String>();

String[] chunks = str.split("\\\"");
if (chunks.length % 2 == 0) {
    // Mismatched escaped quotes!
}
for (int i = 0; i < chunks.length; i++) {
    if (i % 2 == 1) {
        res.addAll(Array.asList(chunks[i].split(",")));
    } else {
        res.add(chunks[i]);
    }
}

This will only split up the portions that are not between escaped quotes.

Call trim() if you want to get rid of the whitespace.

Borealid
  • 95,191
  • 9
  • 106
  • 122