2

In one of my projects I had to deal with Comma Separated files (CSV). I had to split data based on Comma , ignoring commas inside quotes (i.e. "") so I used an expression mentioned on another stack overflow question (Java: splitting a comma-separated string but ignoring commas in quotes). Everything was working fine until recently I noticed that it is not working for one specific scenario mentioned below.

I have a data string needed to split on Commas as:

20Y-62-27412,20Y6227412NK,BRACKET,101H,00D505060,H664374,06/25/2013,1,,

In my understanding based on expression

String[] rowData = str.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

Data after splitting should return me an array of size 10 with last two indexes of array containing an empty string, Instead I am getting an array of size 8 being last two commas not treated as splitter. I have used this expression on several places in my application so I don't want to backtrack from this. Any help would be appreciated. Thanks

Community
  • 1
  • 1
Abdul Rehman
  • 373
  • 6
  • 15
  • Use a [CSV parser](http://stackoverflow.com/questions/101100/csv-api-for-java)? – assylias Jul 30 '13 at 06:02
  • The problem is similar to parsing mathematical terms that include brackets. Most people will tell you that RegEx is **NOT** the way to do it. I'm not even sure that CSV is a regular language (my university days are long gone, so I'm not sure about that... XD) – AKDADEVIL Jul 30 '13 at 06:03
  • I have Japanese characters in my file with very complex character schema inside which I don't think can be easily done with any parser. Also, RE is working fine splitting data accordingly leaving me this only problem i just discovered. Using parser will cost me make changes to my application on dozens of places which is not an option considering little time span. – Abdul Rehman Jul 30 '13 at 06:24

1 Answers1

6

You need to use the split(java.lang.String, int) method

Your code would then look like:

String str = "20Y-62-27412,20Y6227412NK,BRACKET,101H,00D505060,H664374,06/25/2013,1,,";
String[] rowData = str.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)", -1);
Michael Markidis
  • 4,163
  • 1
  • 14
  • 21