0

I am trying to find a regular expression that will match any line breaks that are between double quote seperated strings in a CSV file.

I am able to identify the strings between quotes using this regex: (["])(?:\\.|[^\\])*?\1

I am able to identify line breaks using this regex: (\r\n|\r|\n)

But I'm not sure how to 'combine' the two to get the expression that I need.

The expression should match only line breaks that exist between double quotation marks.

AbraCadaver
  • 78,200
  • 7
  • 66
  • 87
MrGoodfellow
  • 89
  • 1
  • 9

1 Answers1

1

As @fyroc suggested, combine the two regular expression separately, like this:

<?php
$testString = <<<EOF
a,b,c,d,"test
test2
test3",zzz
zz,yy,vv,"a
b
"
uuu,ttt,"xyz",zzz
aaa,bbb,ccc
ddd,"","a","zz"
xyz,abc,"a
b
c
"
"
a,c,d,"
dadasda"
EOF;

function remove_eol($matches) {
    //var_dump($matches);
    return preg_replace('/\R/', '', $matches[0]);
}

$testStringWithoutEnclosedEol = preg_replace_callback('/(["])(?:\\\\.|[^\\\\])*?\1/', 'remove_eol', $testString);

?>
<?php var_dump($testStringWithoutEnclosedEol); ?>

Take notes that I changed this regex

(\r\n|\r|\n)

To simply

\R
line break: matches \n, \r and \r\n

See https://www.php.net/manual/en/regexp.reference.escape.php

And I had to add extra \ in the string passed to preg_replace_callback...