-2

Possible Duplicate:
Can you recommend a Java library for reading (and possibly writing) CSV files?

I need to split the String in Java. The separator is the space character. String may include the paired quotation marks (with some text and spaces inside) - the whole body inside the paired quotation marks should be considered as the single token. Example:

 
Input:
       token1 "token 2"  token3

Output: array of 3 elements:
         token1
         token 2
         token3  

How to do it? Thanks!

Community
  • 1
  • 1
user607573
  • 29
  • 1
  • 5
  • 1
    Post an example of the Input Text, and then what you expect to receive, that will leave less room for interpretation. – edwardsmatt Apr 21 '11 at 01:48
  • -1, see edwardTheGreat's comment on how to post a question with detailed information so we don't have to guess your exact requirment. – camickr Apr 21 '11 at 02:16

3 Answers3

1

Split twice. First on quotes, then on spaces.

Adam
  • 16,808
  • 7
  • 52
  • 98
  • -1, Please explain how this works with (one two "three four" five) and I'll remove my downvote. I suspect the poster wants 4 tokens. The first "one", the second "two", the third "three four" and the fourth "five". – camickr Apr 21 '11 at 02:13
  • The odd numbered tokens will be inside quotes, so don't split those on spaces. The quote split yields "one two", "three four", "five". Split only the even tokens on spaces and you get "one","two","three four","five" – Adam Apr 21 '11 at 03:24
  • good point. But StringTokenizer takes a `returnDelims` flag. If set to true it will return the delimiters as tokens, i.e. '"'. So as you're iterating over the tokens from the quote split if you run into a quote token then you know the next token is a quoted string. The next token after that is going to be '"' again, and so forth. – Adam Apr 21 '11 at 04:18
  • The solution isn't as simple as your original answer would suggest. You need exception handling. However, in the simple case where you don't have to worry about embedded quotes is works reasonably well. I'll post some code I came up with. Feel free to change/delete the code. Downvote removed. – camickr Apr 21 '11 at 04:51
0

Assuming that the other solutions will not work for you, because they do not properly detect matching quotes or ignore spaces within quoted text, try something like:

private void addTokens(String tokenString, List<String> result) {
    String[] tokens = tokenString.split("[\\r\\n\\t ]+");
    for (String token : tokens) {
        result.add(token);
    }
}

List<String> result = new ArrayList<String>();
while (input.contains("\"")) {
    String prefixTokens = input.substring(0, input.indexOf("\""));
    input = input.substring(input.indexOf("\"") + 1);
    String literalToken = input.substring(0, input.indexOf("\""));
    input.substring(input.indexOf("\"") + 1);

    addTokens(prefixTokens, result);
    result.add(literalToken);
}

addTokens(input, result);

Note that this won't handle unbalanced quotes, escaped quotes, or other cases of erroneous/malformed input.

aroth
  • 54,026
  • 20
  • 135
  • 176
0
import java.util.StringTokenizer; 
class STDemo { 
    static String in = "token1;token2;token3"

    public static void main(String args[]) { 

        StringTokenizer st = new StringTokenizer(in, ";"); 

        while(st.hasMoreTokens()) { 
            String val = st.nextToken(); 
            System.out.println(val); 
        } 
    } 
}

this is easy way to string tokenize

edwardsmatt
  • 2,034
  • 16
  • 18
jayesh
  • 2,422
  • 7
  • 44
  • 78