0

Hi I have a csv file with an error in it.so i want it to correct with regular expression, some of the fields contain line break, Example as below

"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy

California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"

the above two lines should be in one line

"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre PkwyCalifornia",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"

I tried to use the below regex but it didnt help me

%s/\\([^\"]\\)\\n/\\1/
  • 2
    Line breaks inside double quotes are legal in CSV (at least, in the most common dialects, as there is no single standard). This is the most common way if line breaks need to be included in a field. You are mutilating Google's address by just pasting those lines together. – Thomas Jul 13 '20 at 09:04

3 Answers3

0

Try this:

public static void main(String[] args) {
    String input = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
            + ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
            + "California\",,\"Mountain View\",,\"United\n"
            + "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";

    Matcher matcher = Pattern.compile("\"([^\"]*[\n\r].*?)\"").matcher(input);
    Pattern patternRemoveLineBreak = Pattern.compile("[\n\r]");

    String result = input;
    while(matcher.find()) {
        String quoteWithLineBreak = matcher.group(1);
        String quoteNoLineBreaks = patternRemoveLineBreak.matcher(quoteWithLineBreak).replaceAll(" ");
        result = result.replaceFirst(quoteWithLineBreak, quoteNoLineBreaks);
    }

    //Output
    System.out.println(result);
}

Output:

"AHLR150","CDS","-1","MDCPBusinessRelationshipID",,,"Investigating","1600 Amphitheatre Pkwy California",,"Mountain View",,"United States",,"California",,,"94043-1351","9958"
DigitShifter
  • 801
  • 5
  • 12
0

Create a RegEx surrounding the text you want to keep by parentheses and that will create a group of matched characters. Then replace the string using the group index to compose as you wish.

String test = "\"AHLR150\",\"CDS\",\"-1\",\"MDCPBusinessRelationshipID\","
        + ",,\"Investigating\",\"1600 Amphitheatre Pkwy\n"
        + "California\",,\"Mountain View\",,\"United\n"
        + "States\",,\"California\",,,\"94043-1351\",\"9958\"\n";
    
System.out.println(test.replaceAll("(\"[^\"]*)\n([^\"]*\")", "$1$2"));

So when we replace the matching string ("United\nStates") by $1$2 we are removing the line break because it not belongs to any group:

  • $1 => the first group (\"[^\"]*) that will match "United
  • $2 => the second group ([^\"]*\")" that will match States"
Milton Castro
  • 1,557
  • 11
  • 14
-1

Based on this you can try with:

/\r?\n|\r/

I checked it here and seems to be fine

user12176589
  • 107
  • 1