2

Using preg_replace (PHP) I want to remove all horizontal whitespace except for the whitespace found between quotes ("" and '') (including escaped quotes)

An example (regex should turn left side in right side):

2 + 2                    => 2+2
f( " ")                  => f(" ")
f("Test \"mystring\" .") => f("Test \"mystring\" .")
f("' ",   " ")           => f("' "," ")

Using another post I came up with: \h(?=[^']*(?:'[^']*'[^']*)*$)(?=[^"]*(?:"[^"]*"[^"]*)*$)

Which basically looks ahead and checks if there are an even amount of quotes until the end of the string (both "" and '').

However, I have problems with escaped characters and quotes inside quotes.

" ' test "  => The ' causes problem
" \" test " => The \" causes problem

I have thought of using negative lookbehinds: (?<!\\)" but can't get it to work. The next regex fails. It doesn't match when a string contains escaped quotes.

\h(?=[^"]*(?:(?<!\\)"(?:[^"]*?(?<!\\)")[^"]*?)*$)
T. Doe
  • 23
  • 4

1 Answers1

0

You may use

'~(?<!\\\\)(?:\\\\{2})*(?:"[^\\\\"]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(*F)|\h+~s'

See the regex demo

Details

  • (?<!\\)(?:\\{2})*(?:"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*')(*SKIP)(*F) - a '...' or "...." substring where the first quotation mark is not itself escaped, which is skipped once matched (so, nothing inside them gets removed)
    • (?<!\\) - no \ char allowed immediately to the left of the current location
    • (?:\\{2})* - zero or more repetitions of double backslashes
    • (?:"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*') - either of the two alternatives:
      • "[^\\"]*(?:\\.[^"\\]*)*" - a string literal inside double quotation marks
      • " - a double quote
      • [^\\"]* - 0 or more chars other than \ and "
      • (?:\\.[^"\\]*)*" - zero or more repetitions of a \ followed with any char (\\.) and then any 0 or more chars other than " and \ ([^"\\]*)
      • | - or
      • '[^\\']*(?:\\.[^'\\]*)*' - a string literal inside single quotation marks
    • (*SKIP)(*F) - PCRE verbs that omit the found match and make the regex engine go on searching for a next match starting at the current regex index
  • |\h+ - or 1 or more horizontal whitespaces

PHP demo:

$strs = ['2 + 2', 'f( " ")', 'f("Test \\"mystring\\" .")', 'f("\' ",   " ")'];
$rx = '~(?<!\\\\)(?:\\\\{2})*(?:"[^\\\\"]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*\')(*SKIP)(*F)|\h+~s';
print_r( preg_replace($rx, '', $strs) );

Output:

Array
(
    [0] => 2+2
    [1] => f(" ")
    [2] => f("Test \"mystring\" .")
    [3] => f("' "," ")
)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563