43

Just out of curiosity, I'm trying to figure out which exactly is the right way to escape a backslash for use in a PHP regular expression pattern like so:

TEST 01: (3 backslashes)

$pattern = "/^[\\\]{1,}$/";
$string = '\\';

// ----- RETURNS A MATCH -----

TEST 02: (4 backslashes)

$pattern = "/^[\\\\]{1,}$/";
$string = '\\';

// ----- ALSO RETURNS A MATCH -----

According to the articles below, 4 is supposedly the right way but what confuses me is that both tests returned a match. If both are right, then is 4 the preferred way?

RESOURCES:

Community
  • 1
  • 1
Mahmoud Tahan
  • 433
  • 1
  • 4
  • 4

6 Answers6

59
// PHP 5.4.1

// Either three or four \ can be used to match a '\'.
echo preg_match( '/\\\/', '\\' );        // 1
echo preg_match( '/\\\\/', '\\' );       // 1

// Match two backslashes `\\`.
echo preg_match( '/\\\\\\/', '\\\\' );   // Warning: No ending delimiter '/' found
echo preg_match( '/\\\\\\\/', '\\\\' );  // 1
echo preg_match( '/\\\\\\\\/', '\\\\' ); // 1

// Match one backslash using a character class.
echo preg_match( '/[\\]/', '\\' );       // 0
echo preg_match( '/[\\\]/', '\\' );      // 1  
echo preg_match( '/[\\\\]/', '\\' );     // 1

When using three backslashes to match a '\' the pattern below is interpreted as match a '\' followed by an 's'.

echo preg_match( '/\\\\s/', '\\ ' );    // 0  
echo preg_match( '/\\\\s/', '\\s' );    // 1  

When using four backslashes to match a '\' the pattern below is interpreted as match a '\' followed by a space character.

echo preg_match( '/\\\\\s/', '\\ ' );   // 1
echo preg_match( '/\\\\\s/', '\\s' );   // 0

The same applies if inside a character class.

echo preg_match( '/[\\\\s]/', ' ' );   // 0 
echo preg_match( '/[\\\\\s]/', ' ' );  // 1 

None of the above results are affected by enclosing the strings in double instead of single quotes.

Conclusions:
Whether inside or outside a bracketed character class, a literal backslash can be matched using just three backslashes '\\\' unless the next character in the pattern is also backslashed, in which case the literal backslash must be matched using four backslashes.

Recommendation:
Always use four backslashes '\\\\' in a regex pattern when seeking to match a backslash.

Escape sequences.

MikeM
  • 13,156
  • 2
  • 34
  • 47
15

To avoid this kind of unclear code you can use \x5c Like this :)

echo preg_replace( '/\x5c\w+\.php$/i', '<b>${0}</b>', __FILE__ );
  • I just want to say huge thank you for this. Escaping escape characters like `\n` is a pain already, but doing it in regex with lookbehind is a challenge. – Alex Skrypnyk Jun 04 '17 at 02:41
  • Avoiding a `back slash` only to replace with another three characters and a back slash its self again. Phew! – Cholthi Paul Ttiopic Apr 02 '18 at 12:55
  • @CholthiPaulTtiopic two backslashes wouldn't work, you should write 4 backslashes. https://onlinegdb.com/3IclPtzxW – Alex78191 Dec 20 '21 at 08:35
6

The thing is, you're using a character class, [], so it doesn't matter how many literal backslashes are embedded in it, it'll be treated as a single backslash.

e.g. the following two regexes:

/[a]/
/[aa]/

are for all intents and purposes identical as far as the regex engine is concerned. Character classes take a list of characters and "collapse" them down to match a single character, along the lines of "for the current character being considered, is it any of the characters listed inside the []?". If you list two backslashes in the class, then it'll be "is the char a blackslash or is it a backslash?".

Marc B
  • 356,200
  • 43
  • 426
  • 500
5

The answer https://stackoverflow.com/a/15369828/2311074 is very illustrative, but if you don't know the core problem of backslashes in PHP string you won't understand it at all.

The core problem of backslashen in PHP strings is explained at https://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.single You may want to pay attention to the last two sentences:

The simplest way to specify a string is to enclose it in single quotes (the character ').

To specify a literal single quote, escape it with a backslash ().To specify a literal backslash, double it (\). All other instances of backslash will be treated as a literal backslash

So in short, two backslashes in a string represent a literal backslash. A single backslash not followed by a ' also represents a literal backslash.

This is a bit odd, but it means a string '\\xxx' and '\xxx' both represent the same string \xxx.
Note, that '\\'xxx' is an invalid string whereas '\'xxx' represents the string 'xxx.

I guess it originates from this: If you want to have a literal single quote, you need to escape it with backslash. So 'hi\'' represents the string hi'. But now you end up in the situation that you maybe want to create the string hi\ but 'hi\' would not work anymore (invalid string like this without ending '). Therefore, one needed an extra escape to prevent the special meaning from \ Thus, one decided \ escapes \ and hi\ can be written by 'hi\\'.

And this is the reason why '\\\' is the same as '\\\\' (both represent \\) and for those two strings it does not matter at all what you use.

However, it has the surprising effect, that if you double the strings, they are not the same. This is because 3 backslashes enclosed in single quotes represent 2 literal backslashes. But 6 backslashes enclosed in single quotes represent only 3 literal backslashes. Whereas 4 backslashes enclosed in single quotes represent 2 literal backslashes and 8 backslashes enclosed in single quotes represent 4 literal (see examples from MikeM). Thus, its recommended to always use 4 instead of 3.

Adam
  • 25,960
  • 22
  • 158
  • 247
3

I've studied this years ago. That's because 1st backslash escapes the 2nd one and they together form a 'true baclkslash' character in pattern and this true one escapes the 3rd one. So it magically makes 3 backslashes work.

However, normal suggestion is to use 4 backslashes instead of the ambiguous 3 backslashes.

If I'm wrong about anything, please feel free to correct me.

Scott Chu
  • 972
  • 14
  • 26
0

You can also use the following

$regexp = <<<EOR
schemaLocation\s*=\s*["'](.*?)["']
EOR;
preg_match_all("/".$regexp."/", $xml, $matches);
print_r($matches);

keywords: dochere, nowdoc

test30
  • 3,496
  • 34
  • 26