3

I'm trying to parse comma separated values that are enclosed in quotes using only standard Java libraries (I know this must be possible)

As an example file.txt contains a new line for each row of

"Foo","Bar","04042013","04102013","Stuff"
"Foo2","Bar2","04042013","04102013","Stuff2"

However when I parse the file with the code I've written so far:

import java.io.*;
import java.util.Arrays;
 public class ReadCSV{

    public static void main(String[] arg) throws Exception {

        BufferedReader myFile = new BufferedReader(new FileReader("file.txt"));

        String myRow = myFile.readLine(); 
        while (myRow != null){
            //split by comma separated quote enclosed values
            //BUG - first and last values get an extra quote
            String[] myArray = myRow.split("\",\""); //the problem

            for (String item:myArray) { System.out.print(item + "\t"); }
            System.out.println();
            myRow = myFile.readLine();
        }
        myFile.close();
    }
}

However the output is

"Foo    Bar     04042013        04102013        Stuff"

"Foo2   Bar2    04042013        04102013        Stuff2"

Instead of

Foo    Bar     04042013        04102013        Stuff

Foo2   Bar2    04042013        04102013        Stuff2

I know I went wrong on the Split but I'm not sure how to fix it.

Duncan Jones
  • 67,400
  • 29
  • 193
  • 254
sputn1ck
  • 163
  • 1
  • 2
  • 11
  • possible duplicate of [Java: splitting a comma-separated string but ignoring commas in quotes](http://stackoverflow.com/questions/1757065/java-splitting-a-comma-separated-string-but-ignoring-commas-in-quotes) – Till Helge Apr 22 '13 at 07:19
  • I read over that actually and this is not the same issue. That example had comma separated values with occasional quotes, my question is regarding comma separated quote enclosed for all values. – sputn1ck Apr 22 '13 at 07:20
  • not to mention the solution there was a regex which should not have to be the case here (I hope!). My desired output shows no quotes at all whereas the output in the other post retains quotes. – sputn1ck Apr 22 '13 at 07:22
  • Why don't you just remove the quotes from the first and last element? – Marco Forberg Apr 22 '13 at 07:23
  • To keep simple: split first by comma, then each token by double-quotes. – PeterMmm Apr 22 '13 at 07:23
  • @PeterMmm: i guess he wants to split that way to preserve commas within the fields – Marco Forberg Apr 22 '13 at 07:24
  • @TillHelge - his question is different from the dup you posted. – user93353 Apr 22 '13 at 07:35
  • So I picked one of the questions that don't exactly match. My bad. How about [this one](http://stackoverflow.com/questions/9605773/java-regex-split-comma-separated-values-but-ignore-commas-in-quotes?rq=1). It's a common task and has been explained several times already. – Till Helge Apr 22 '13 at 07:38
  • how about removing all double quotes (string.replace("\"","")) and then split on commas? – benst Apr 22 '13 at 07:50
  • there is a reason for quotes. see http://en.wikipedia.org/wiki/Comma-separated_values#Toward_standardization – Marco Forberg Apr 22 '13 at 07:54
  • possible duplicate of [Parsing CSV in java](http://stackoverflow.com/questions/3908012/parsing-csv-in-java) – Raedwald Apr 03 '14 at 13:07

6 Answers6

4

Before doing split, just remove first double quote and last double quote in myRow variable using below line.

myRow = myRow.substring(1, myRow.length() - 1);

(UPDATE) Also check if myRow is not empty. Otherwise above code will cause exception. For example below code checks if myRow is not empty and then only removes double quotes from the string.

if (!myRow.isEmpty()) {
    myRow = myRow.substring(1, myRow.length() - 1);
}
Niraj Nawanit
  • 2,431
  • 3
  • 16
  • 11
4

i think you will probably have to go for a stateful approach, basically like the code below (another state would be necessary if you want to allow escaping of quotes within a value):

import java.util.ArrayList;
import java.util.List;


public class CSV {

    public static void main(String[] args) {
        String s = "\"hello, i am\",\"a string\"";
        String x = s;
        List<String> l = new ArrayList<String>();
        int state = 0;
        while(x.length()>0) {
            if(state == 0) {
                if(x.indexOf("\"")>-1) {
                    x = x.substring(x.indexOf("\"")+1).trim();
                    state = 1;
                } else {
                    break;
                }
            } else if(state == 1) {
                if(x.indexOf("\"")>-1) {
                    String found = x.substring(0,x.indexOf("\"")); 
                    System.err.println("found: "+found);
                    l.add(found);
                    x = x.substring(x.indexOf("\"")+1).trim();
                    state = 0;
                } else {
                    throw new RuntimeException("bad format");
                }
            } else if(state == 2) {
                if(x.indexOf(",")>-1) {
                    x = x.substring(x.indexOf(",")+1).trim();
                    state = 0;
                } else {
                    break;
                }
            }
        }
        for(String f : l) {
            System.err.println(f);
        }
    }


}
rmalchow
  • 2,689
  • 18
  • 31
2

Instead, you can use replaceAll, which, for me, looks more suitable for this task:

myRow = myRow.replaceAll("\"", "").replaceAll(","," ");

This will replace all the " with nothing (Will remove them), then it'll replace all , with space (You can increase the number of spaces of course).

Maroun
  • 94,125
  • 30
  • 188
  • 241
  • 1
    usually there is a reason for comma separated values to be quoted and that is: they might contain commas! In this case your solution would split the list where it is not to split – Marco Forberg Apr 22 '13 at 07:28
  • What are you talking about? This solves his problem. BTW, my solutions doesn't split anything. – Maroun Apr 22 '13 at 07:29
  • 2
    only in the given case. think about this "foo","bar","foo,bar" your solution would return an array with 4 items but there are only three – Marco Forberg Apr 22 '13 at 07:30
  • He didn't mention this in the question. I assumed his format is fixed. – Maroun Apr 22 '13 at 07:32
  • true, but why ignore the obvious? – Marco Forberg Apr 22 '13 at 07:40
  • If it was obvious I wouldn't ignore this. – Maroun Apr 22 '13 at 07:41
  • 2
    again, this does not account for commas within the values ... the reason why people do quotes AND commas is usually to have a the option to HAVE commas in the values. see my stateful parser below. – rmalchow Apr 22 '13 at 07:55
1

The problem in above code snippet is that you are splitting the String based on ",". on your Line start "foo"," and end ","stuff" the starting and ending quotes does not match with "," so there are not splitted.

so this definitely not a bug in java. in your case you need to handle that part yourself.

You have multiple options to do it. some of them can be like below. 1. If you are sure there will be always a starting " and ending " you can remove them from String before hand before splitting. 2. If the starting " and " are optional, you can first check it with startsWith endsWith and then remove if exists before splitting.

rahul maindargi
  • 5,359
  • 2
  • 16
  • 23
0

You can simply get the String delimitered by the comma and then delete the first and last '"'. =) hope thats helpfull dont have much time :D

String s = "\"Foo\",\"Bar\",\"04042013\",\"04102013\",\"Stuff\"";
        String[] bufferArray = new String[10];
        String bufferString;
        int i = 0;
        System.out.println(s);

        Scanner scanner = new Scanner(s);
        scanner.useDelimiter(",");

        while(scanner.hasNext()) {
            bufferString = scanner.next();
            bufferArray[i] = bufferString.subSequence(1, bufferString.length() - 1).toString();
            i++;
        }

        System.out.println(bufferArray[0]);
        System.out.println(bufferArray[1]);
        System.out.println(bufferArray[2]);
datosh
  • 498
  • 7
  • 20
  • 1
    what happens if there's a comma WITHIN the quotes? – rmalchow Apr 22 '13 at 07:35
  • than the parser will fail to do his job. You cannot use the delimiter as a valid charater in the string. If you want to use a comma in the string you can use two commas as a delimiter or something else – datosh Apr 23 '13 at 12:50
  • @datosh: you **can** use commas as valid characters **and** delimiter. that is exactly what the quotes are for. Internal quotes are the things that need escaping, not commas. – Marco Forberg May 03 '13 at 06:07
0

This solution is less elegant than a String.split() oneliner. The advantage is that we avoid fragile string manipulation, ie. the use of String.substring(). The string must end with ," however.

This version handles spaces between delimiters. Delimiter characters within quotes are ignored as expected, as are escaped quotes (for example \").

String s = "\"F\\\",\\\"oo\"  ,    \"B,ar\",\"04042013\",\"04102013\",\"St,u\\\"ff\"";
Pattern p = Pattern.compile("(.*?)\"\\s*,\\s*\"");
Matcher m = p.matcher(s + ",\""); // String must end with ,"
while (m.find()) {
    String result = m.group(1);
    System.out.println(result);
}
Sundae
  • 724
  • 1
  • 8
  • 27