3

I want to split text that must fit to csv syntax but the text contains comma

Example: account id title text

String line = "account123,2222,Thnaks for reaching out,\"Hey [[customerFirstName]], Thanks for reaching out to us.\""

String[] splitted = line.split(",");

Result:

splitted = {String[5]@539} 
 0 = "account123"
 1 = "2222"
 2 = "Thnaks for reaching out"
 3 = ""Hey [[customerFirstName]]"
 4 = " Thanks for reaching out to us.""

But I expect

splitted = {String[4]@539} 
             0 = "account123"
             1 = "2222"
             2 = "Thnaks for reaching out"
             3 = "Hey [[customerFirstName]], Thanks for reaching out to us.\"
VitalyT
  • 1,671
  • 3
  • 21
  • 49
  • Could you not just concat that with a comma? –  Aug 01 '18 at 14:12
  • Thats mean you need to omit the last comma – Blasanka Aug 01 '18 at 14:12
  • maybe split first with `"` then with `,` help you. – Hadi J Aug 01 '18 at 14:13
  • This question comes up often here, not only in Java, but across basically every language tag. The usual way to deal with this is to escape each field with a special character, e.g. escape each CSV field with double quotes. – Tim Biegeleisen Aug 01 '18 at 14:13
  • 1
    the Java `.split()` has a second argument which is the "limit". In your case above you could use ... `.split(",", 3)` – 7 Reeds Aug 01 '18 at 14:14
  • take a look here: https://www.mkyong.com/java/how-to-read-and-parse-csv-file-in-java/ – Luca Di Liello Aug 01 '18 at 14:14
  • 1
    Consider using CSV parser library, take a look at [commons-csv](https://commons.apache.org/proper/commons-csv/) or [opencsv](http://opencsv.sourceforge.net/) – Vladimir Vagaytsev Aug 01 '18 at 14:19
  • read this https://stackoverflow.com/questions/18893390/splitting-on-comma-outside-quotes – Youcef LAIDANI Aug 01 '18 at 14:20
  • I dont want to use libraries (its a production code ) ..do you have concrete solution , maybe regex ? – VitalyT Aug 01 '18 at 14:20
  • [`String line = "account123,2222,Thnaks for reaching out,\"Hey [[customerFirstName]], Thanks for reaching out to us.\""; String[] split = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"); Arrays.stream(split) .forEach(System.out::println);`](https://ideone.com/kaCIWn) – Youcef LAIDANI Aug 01 '18 at 14:24
  • thanks it works :) split by - ,(?=(?:[^"]*"[^"]*")*[^"]*$) – VitalyT Aug 01 '18 at 14:30

2 Answers2

1

Your solution is, as you have found, very brittle. The good news is that there are a number of more robust CSV solutions available. For purposes of this answer, I'll use openCSV, where your reading code becomes:

CSVReader csvReader = new CSVReader(reader);
List<String[]> list = csvReader.readAll();
reader.close();
csvReader.close();

Hope that helps...

hd1
  • 33,938
  • 5
  • 80
  • 91
0

Here is a simple solution:

public static void main(String... args) {
    String line = "account123,2222,Thnaks for reaching out,\"Hey [[customerFirstName]], Thanks for reaching out to us.\",\"Hey [[customerFirstName]], Thanks for reaching out to us.\"";
    for (String s : splitByComma(line)) {
        System.out.println(s);
    }
}

private static List<String> splitByComma(String line) {
    String[] words = line.split(",");
    List<String> list = new ArrayList<>();
    for (int i = 0; i < words.length; ++i) {
        if (words[i].startsWith("\"")) { // collect from the start of the cell;
            String s = words[i].substring(1);
            while (i < words.length - 1) {
                s += "," + words[++i].substring(0, words[i].length() - 1);
                if (words[i++].endsWith("\"")) break; // jump out of the cell after the closing double quotes;
            }
            list.add(s);
            i--;
        } else {
            list.add(words[i]);
        }
    }
    return list;
}

And your output will be:

account123
2222
Thnaks for reaching out
Hey [[customerFirstName]], Thanks for reaching out to us.
Hey [[customerFirstName]], Thanks for reaching out to us.
Hearen
  • 7,420
  • 4
  • 53
  • 63