1

Here is my code

     BufferedReader br = new BufferedReader(new InputStreamReader(sr));
      String splitBy = ",";
      String line = br.readLine();
      while((line = br.readLine()) != null){
        String[] b = line.split(splitBy);
        System.out.println("\"" + b[0] + "\",\"" +b[4] + "\",\""+ b[6] + "\"");
      }
      br.close();
    }
  }

The columns in my csv file should print out like this

"John", "Smith", "Smith,John"  

but it takes the comma in the column and splits it into two columns like this;

""Smith" John"", "John", "Smith"

How can I get it to ignore the column that is in the column and not split it into two columns AND stop it from adding double quotes.

Thanks in advance

MC Emperor
  • 22,334
  • 15
  • 80
  • 130
JoeyOC
  • 133
  • 8
  • 1
    Use a proper CSV parser instead of trying to write your own. There are many parsers to choose from, e.g. [Jackson-databind-csv](https://github.com/FasterXML/jackson-dataformats-text/tree/2.14/csv), [Univocity-parsers](https://www.univocity.com/pages/about-parsers), [OpenCSV](http://opencsv.sourceforge.net/dependency-info.html) and [Apache Commons CSV](https://commons.apache.org/proper/commons-csv/). – k314159 Feb 22 '22 at 12:43
  • 1
    Also, for your use case, I wouldn't bother writing a program. Just install the excellent [csvkit](https://csvkit.readthedocs.io/en/latest/) and use the `csvcut` command which will do what you want. – k314159 Feb 22 '22 at 12:54

3 Answers3

1

If you have commas in your data rows then change your separator. Use ;. There is no way for the program to know when to skip the delimiter and when not...

CSV can have any separator that you find suitable (some use :, @, ;, |, etc..)

Renis1235
  • 4,116
  • 3
  • 15
  • 27
  • ASCII and Unicode define specific characters for use as delimiters: field is code point 31, record is 30, group of records is 29, and table/file is 28. See [*What are the file/group/record/unit separator control characters and their usage?*](https://stackoverflow.com/q/8695118/642706). `String fieldDelimiter = Character.toString( 31 ) ;` – Basil Bourque Feb 22 '22 at 16:34
  • Would my answer count as correct though? Thank you for the comment. Very informative. @BasilBourque – Renis1235 Feb 22 '22 at 19:48
  • Yes, I would say this Answer is correct in a way, thus my up-vote. But it depends on how the problem is defined. Using a COMMA between field-developing QUOTATION MARK characters is the canonical delimiter, thus the name *Comma*-Separated Values (CSV). So really such input as given in the Question should be parsed as CSV, without the need for a substitute delimiter as you suggest. But as a shortcut, instead of building or using a proper CSV parser, your suggestion would work. – Basil Bourque Feb 22 '22 at 22:36
1

Do not split by COMMA first. You must first split by the pairs of QUOTATION MARK enclosing each field. See the CSV specification.

I recommend you make use of an existing CSV parsing library rather than write your own. You have a choice of several good libraries in the Java ecosystem. For example, I have used Apache Commons CSV in a few projects. More libraries are mentioned in this Comment.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
0

Modify your regex to split on all comma characters unless it's in between quotes.

String[] b = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

See the acepted answers in these posts:

Java: splitting a comma-separated string but ignoring commas in quotes

Splitting on comma outside quotes

Eritrean
  • 15,851
  • 3
  • 22
  • 28