2

I have the string "MO""RET" gets stored in items[1] array after the split command. After it get's stored I do a replaceall on this string and it replaces all the double quotes. But I want it to be stored as MO"RET. How do i do it. In the csv file from which i process using split command Double quotes within the contents of a Text field are repeated (Example: This account is a ""large"" one"). So i want retain the one of the two quotes in the middle of string if it get's repeated and ignore the end quotes if present . How can i do it?

String items[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
items[1] has "MO""RET"
String recordType = items[1].replaceAll("\"","");

After this recordType has MORET I want it to have MO"RET

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
Arav
  • 4,957
  • 23
  • 77
  • 123
  • 4
    Less than one hour ago you posted a very similar question http://stackoverflow.com/questions/2241758/regarding-java-split-command-parsing-csv-file which you haven't responded to, down or upvoted, or accepted. If you don't give back to the site, people will stop giving to you. – Mark Byers Feb 11 '10 at 02:56
  • 1
    @Mark Byers: oh, how I wish that were true. – danben Feb 11 '10 at 03:17

4 Answers4

6

Don't use regex to split a CSV line. This is asking for trouble ;) Just parse it character-by-character. Here's an example:

public static List<List<String>> parseCsv(InputStream input, char separator) throws IOException {
    BufferedReader reader = null;
    List<List<String>> csv = new ArrayList<List<String>>();
    try {
        reader = new BufferedReader(new InputStreamReader(input, "UTF-8"));
        for (String record; (record = reader.readLine()) != null;) {
            boolean quoted = false;
            StringBuilder fieldBuilder = new StringBuilder();
            List<String> fields = new ArrayList<String>();
            for (int i = 0; i < record.length(); i++) {
                char c = record.charAt(i);
                fieldBuilder.append(c);
                if (c == '"') {
                    quoted = !quoted;
                }
                if ((!quoted && c == separator) || i + 1 == record.length()) {
                    fields.add(fieldBuilder.toString().replaceAll(separator + "$", "")
                        .replaceAll("^\"|\"$", "").replace("\"\"", "\"").trim());
                    fieldBuilder = new StringBuilder();
                }
                if (c == separator && i + 1 == record.length()) {
                    fields.add("");
                }
            }
            csv.add(fields);
        }
    } finally {
        if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
    }
    return csv;
}

Yes, there's little regex involved, but it only trims off ending separator and surrounding quotes of a single field.

You can however also grab any 3rd party Java CSV API.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Thanks a lot. Thanks a lot. In case if the my string has a value of "TEST"REPLA". If there is only one single double quote in the middle of the string how can i delete the first ,last quote and retain all the middle quote. I want the output as TEST"REPLA Example 2 : "EXAM"PLE"2IN" I want the output as EXAM"PLE"2IN First and last quotes needs to be deleted – Arav Feb 11 '10 at 04:55
  • 1
    The posted code example already does that (assuming that your CSV file adheres the RFC4180 as outlined here http://www.rfc-editor.org/rfc/rfc4180.txt ). – BalusC Feb 11 '10 at 13:11
  • I used your code. Great! Humm... There is a little problem. I expected `["A","B","",""]` from line `A,B,,` of exported file from spreadsheet, but I got `["A","B",""]`. – Paul Vargas Jan 17 '13 at 07:21
  • @Paul: Oh, I overlooked that edge case. I updated the answer. – BalusC Jan 17 '13 at 12:08
1

How about:

String recordType = items[1].replaceAll( "\"\"", "\"" );
PSpeed
  • 3,346
  • 20
  • 12
  • Thanks a lot. In case if the my string has a value of "TEST"REPLA". If there is only one single double quote in the middle of the string how can i delete the first ,last quote and retain all the middle quote. I want the output as TEST"REPLA Example 2 : "EXAM"PLE"2IN" I want the output as EXAM"PLE"2IN First and last quotes needs to be deleted – Arav Feb 11 '10 at 04:57
  • It's difficult to do this with regex and cover the case where there is one starting quote and no ending quote, etc.. And the regex starts to get really complicated. You are really starting to get better off parsing the whole line. If you really just want the specific start/end quote case then just check for this with charAt() and do a substring. It will be faster than regex anyway. – PSpeed Feb 11 '10 at 08:30
0

I prefer you to use replace instead of replaceAll. replaceAll uses REGEX as the first argument.

The requirement is to replace two continues QUOTES with one QUOTE

String recordType = items[1].replace( "\"\"", "\"" );

To see the difference between replace and replaceAll , execute bellow code

recordType = items[1].replace( "$$", "$" );
recordType = items[1].replaceAll( "$$", "$" );
  • Thanks a lot. In case if the my string has a value of "TEST"REPLA". If there is only one single double quote in the middle of the string how can i delete the first ,last quote and retain all the middle quote. I want the output as TEST"REPLA Example 2 : "EXAM"PLE"2IN" I want the output as EXAM"PLE"2IN First and last quotes needs to be deleted – Arav Feb 11 '10 at 04:52
0

Here you can use the regular expression.

recordType = items[1].replaceAll( "\\B\"", "" ); 
recordType = recordType.replaceAll( "\"\\B", "" ); 

First statement replace the quotes in the beginning of the word with empty character. Second statement replace the quotes in the end of the word with empty character.