0

I'm having a string containing CSV lines. Some of its values contains the CRLF characters, marked [CRLF] in the example below

NOTE: Line 1: and Line 2: aren't part of the CSV, but for the discussion

Line 1: 
foo1,bar1,"john[CRLF]
dose[CRLF]
blah[CRLF]
blah",harry,potter[CRLF]
Line 2:
foo2,bar2,john,dose,blah,blah,harry,potter[CRLF]

Each time a value in a line have a CRLF, the whole value appears between quotes, as shown by line 1. Looking for a way to get ride of those CRLF when they appears between quotes.

Tried regexp such as:

data.replaceAll("(,\".*)([\r\n]+|[\n\r]+)(.*\",)", "$1 $3");

Or just ([\r\n]+) , \n+, etc. without success: the line continue to appears as if no replacement were made.

EDIT:

Solution

Found the solution here:

String data = "\"Test Line wo line break\", \"Test Line \nwith line break\"\n\"Test Line2 wo line break\", \"Test Line2 \nwith line break\"\n";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("\"[^\"]*\"").matcher(data);
while (m.find()) {
    m.appendReplacement(result, m.group().replaceAll("\\R+", ""));
}
m.appendTail(result);
System.out.println(result.toString());
Hey StackExchange
  • 2,057
  • 3
  • 19
  • 35

1 Answers1

1

Using Java 9+ you can use a function code inside Matcher#replaceAll and solve your problem using this code:

// pattern that captures quoted strings ignoring all escaped quotes
Pattern p = Pattern.compile("\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"");

String data1 = "\"Test Line wo line break\", \"Test Line \nwith line break\"\n\"Test Line2 wo line break\", \"Test Line2 \nwith line break\"\n";

// functional code to get all quotes strings and then remove all line 
// breaks from matched substrings
String repl = p.matcher(data1).replaceAll(
   m -> m.group().replaceAll("\\R+", "")
);

System.out.println(repl);

Output:

"Test Line wo line break", "Test Line with line break"
"Test Line2 wo line break", "Test Line2 with line break"

Code Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643