3

I am coding in Java and have a method that returns a string that looks something like this:

0, 2, 23131312,"This, is a message", 1212312

and I would like the string to be spit like:

["0", "2", "23131312", "This, is a message", "1212312"]

When I use the split string method on comma, it splits the "This, is a message" as as well, which I don't want. I would like it to ignore that particular comma and get rid of double quotes, if possible.

I looked up some answers and CSV seems to be the way to do it. However, I don't understand it properly.

Dale K
  • 25,246
  • 15
  • 42
  • 71
callMeJava
  • 61
  • 1
  • 2
  • 12
  • 2
    Strip off the square brackets and use a CSV parser. Otherwise you have to deal with escaped double quotes, for example... Or simply parse it as a [JSON array](http://stackoverflow.com/q/5293555/2071828). – Boris the Spider Dec 13 '15 at 23:13
  • You should be using a CSV library to parse the initial string, then output the resulting fields as strings, providing the quotes yourself. – Jim Garrison Dec 13 '15 at 23:26

4 Answers4

9

I think you can use the regex,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$) from here: Splitting on comma outside quotes

You can test the pattern here: http://regexr.com/3cddl

Java code example:

public static void main(String[] args) {
    String txt = "0, 2, 23131312,\"This, is a message\", 1212312";

    System.out.println(Arrays.toString(txt.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)")));

}
Community
  • 1
  • 1
Bohus Andrei
  • 328
  • 3
  • 7
0

A simpler way is convert the master string to a json object array which automatically takes care of the actual elements and gives you an object array.

0

Another way of doing it, would be to iterate through the string, save an index, and when you hit a " ", do String.substring and insert into the array, and update the index. When you hit a double quote ("), you look for another double quote, and insert the substring into the array and update the index.

byInduction
  • 405
  • 1
  • 6
  • 13
0

I'll comment on solutions based on programming an algorithm from scratch without the help of any library. I'm not saying that this is better than using a library.

First, this problem has more quirks than it would seem at first glance. I mean:

  • Spaces around commas must be removed.
  • Syntax errors are possible, e.g. 0,1,"string"notcomma,hi
  • I wonder how double quotes within a string would be escaped, I guess double quotes would be doubled (e.g. "This, is a ""message"""). These should be parsed correctly too.

If (as it seems) non-quoted values are always numbers (or, at least, whitespace-free), I'd go for a solution which scans the string:

class StringScanner
{
    private final String s;
    private int currentPosition;

    public StringScanner (String s)
    {
        this.s = s;
        this.currentPosition = 0;
        skipWhitespace ();
    }

    private void skipWhitespace ()
    {
        while (currentPosition < s.length() && s.charAt (currentPosition) == ' ')
            currentPosition++;
    }

    private String nextNumber ()
    {
        final int start = currentPosition;

        while (currentPosition < s.length() && s.charAt (currentPosition) != ' ')
            currentPosition++;

        return s.substring (start, currentPosition);
    }

    private String nextString ()
    {
        if (s.charAt (currentPosition) != '\"')
            throw new Error ("You should NEVER see this error, no matter what the input string is");

        currentPosition++;
        final int start = currentPosition;

        // Modify the following loop to test for escaped quotes if necessary
        while (currentPosition < s.length() && s.charAt (currentPosition) != '\"')
            currentPosition++;

        if (currentPosition >= s.length || s.charAt (currentPosition) != '\"')
            throw new Error ("Parse error: Unterminated string");

        final String r = s.substring (start, currentPosition);

        currentPosition++;

        return r;
    }

    public String nextField ()
    {
        String r;

        if (currentPosition >= s.length ())
            r = null;
        else if (s.charAt (currentPosition) == '\"')
            r = nextString ();
        else
            r = nextNumber ();

        skipWhitespace ();

        if (currentPosition < s.length () && s.charAt (currentPosition) != ',')
            throw new Error ("Parse error: no comma at end of field");

        currentPosition++;

        skipWhitespace ();

        if (currentPosition >= s.length ())
            throw new Error ("Parse error: string ends with comma");

        return r;
    }
}

Then, split the string with something like:

String s = "0, 1, \"Message, ok?\", 55";

StringScanner ss = new StringScanner (s);

String field = ss.nextField ();

while (field != null)
{
    System.out.println ("Field found: \"" + field + "\"");
    field = ss.nextField ();
}
Jojonete
  • 340
  • 1
  • 8