0

I am writing a java application which reads CSV from the standard input. However, I found that I have some troubles in deal with the double quotes.

For example, if i read in a text:

"He said, ""What?"""

the output gives me:

field[0] = `He said, What?"""'

The last two quotes are what I don't want.

Here is my code:

public class Csv{
private BufferedReader fin;
private String fieldsep;
private ArrayList field;

public Csv(){
    this(System.in, ",");
}


public Csv(InputStream in, String sep){
    this.fin = new BufferedReader(new InputStreamReader(in));
    this.fieldsep = sep;
}


// getline: get one line, grow as needed
public String getline() throws IOException {
    String line;

    line = fin.readLine();
    if (line == null)
        return null;

    field = split(line, fieldsep);

    return line;
}

// split: split line into fields
private static ArrayList split(String line, String sep){
    ArrayList list = new ArrayList();
    int i, j;

    if (line.length() == 0)
        return list;

    i = 0;
    do {
        if (i < line.length() && line.charAt(i) == '"') {
            StringBuffer field = new StringBuffer();
            j = advquoted(line, ++i, sep, field);
            list.add(field.toString());
        } 

        else {
            j = line.indexOf(sep, i);
            if (j == -1)
                j = line.length();
            list.add(line.substring(i, j));
        }
        i = j + sep.length();
    } while (j < line.length());

    return list;
}

// advquoted: quoted field; return index of next separator
private static int advquoted(String s, int i, String sep, StringBuffer field){
    field.setLength(0);
    for ( ; i < s.length(); i++) {
        if (s.charAt(i) == '"' && ++i < s.length() && s.charAt(++i) != '"') {
            int j = s.indexOf(sep, i);
            if (j == -1)
                j = s.length();
            field.append(s.substring(i, j));
            i = j;
            break;
        }
        field.append(s.charAt(i));
    }

    return i;
}
Paul Wasilewski
  • 9,762
  • 5
  • 45
  • 49
RunningPig
  • 55
  • 2
  • 9
  • I am not sure but to me it's very hard to get through your code. It's seems you have solved the problem very awkward. To get your values from your CSV you can use for example StringTokenizer see https://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html – Paul Wasilewski May 23 '16 at 04:30
  • Which last to commas do you mean? – Paul Wasilewski May 23 '16 at 04:34
  • `The last two commas are what I don't want.` ... do you mean the last two _quotes_ are not what you want? – Tim Biegeleisen May 23 '16 at 04:35
  • 1
    Are you aware that there are good CSV libraries already available? Is there a reason you need to write your own? – Jim Garrison May 23 '16 at 04:52
  • Yeh.it should be quotes. this is part of my homework. I just trying to make my own cvs class. – RunningPig May 23 '16 at 05:15

4 Answers4

1

Regex and streams to the rescue. You only need one line for the whole thing:

String[] terms = Arrays.stream(csv.split(",(?=(([^\"]*\"){2})*[^\"]*$"))
  .map(s -> s.replace("\"\"", "").toArray(String[]::new);
Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

You have made it very complex simply use StringTokenizer

String testString = "He said, \"\"What?\"\"";
        StringTokenizer st = new StringTokenizer(testString);
         while (st.hasMoreTokens()) {
             System.out.println(st.nextToken());
         }

Output:

He
said,
""What?""

Now you can play with these strings.

Sanjit Kumar Mishra
  • 1,153
  • 13
  • 32
0

As others have suggested, you can make your life easier by using StringTokenizer. The delimiters should be the comma and the double quote, and you want the StringTokenizer to return the delimiters to you. When the delimiter is a comma, the field will be everything up to the next comma. When the delimiter is the ", the field will be everything up to the next ". You may want to trim the fields and remove leading and trailing " from them.

Ayman
  • 11,265
  • 16
  • 66
  • 92
-1

I guess this should be fine.

public class Csv{

   private BufferedReader fin;
   private String fieldsep;
   private ArrayList<String> field;

   public Csv(){
      this(System.in, ",");
   }


   public Csv(InputStream in, String sep){
      this.fin = new BufferedReader(new InputStreamReader(in));
      this.fieldsep = sep;
   }

   // getline: get one line, grow as needed
   public String getline() throws IOException {
      String line;
      line = fin.readLine();
      if (line == null)
        return null;
      field = split(line, fieldsep);
      return line;
   }

   // split: split line into fields
   private ArrayList split(String line, String sep){
      List<String> list = new ArrayList();
      StringTokenizer tokens = new StringTokenizer(line, sep, false);
      while (tokens.hasMoreElements()) {
            String next = (String) tokens.nextElement();
            next = next.trim().replaceAll("\"\"", "\"");
            list.add(next);
      }
      return list;
   }
}

The result is field = [He said, "What?"].

You should consider that a field in CSV can be enclosed by double qoutes. I don't know if this causes the multiple qoutes of the field "What?" but if so you should know that all fields then should be enclosed by double quotes. For more information about the CSV format see https://www.ietf.org/rfc/rfc4180.txt.

And be aware you never close your input stream! This could cause a memory leak. For more information see Closing BufferedReader and InputStreamReader.

Community
  • 1
  • 1
Paul Wasilewski
  • 9,762
  • 5
  • 45
  • 49