6

To match a literal backslash, many people and the PHP manual say: Always triple escape it, like this \\\\

Note:

Single and double quoted PHP strings have special meaning of backslash. Thus if \ has to be matched with a regular expression \\, then "\\\\" or '\\\\' must be used in PHP code.

Here is an example string: \test

$test = "\\test"; // outputs \test;

// WON'T WORK: pattern in double-quotes double-escaped backslash
#echo preg_replace("~\\\t~", '', $test); #output -> \test

// WORKS: pattern in double-quotes with triple-escaped backslash
#echo preg_replace("~\\\\t~", '', $test); #output -> est

// WORKS: pattern in single-quotes with double-escaped backslash
#echo preg_replace('~\\\t~', '', $test); #output -> est

// WORKS: pattern in double-quotes with double-escaped backslash inside a character class
#echo preg_replace("~[\\\]t~", '', $test); #output -> est

// WORKS: pattern in single-quotes with double-escaped backslash inside a character class
#echo preg_replace('~[\\\]t~', '', $test); #output -> est

Conclusion:

  • If the pattern is single-quoted, a backslash has to be double-escaped \\\ to match a literal \
  • If the pattern is double-quoted, it depends whether the backlash is inside a character-class where it must be at least double-escaped \\\ outside a character-class it has to be triple-escaped \\\\

Who can show me a difference, where a double-escaped backslash in a single-quoted pattern e.g. '~\\\~' would match anything different than a triple-escaped backslash in a double-quoted pattern e.g. "~\\\\~" or fail.

When/why/in what scenario would it be wrong to use a double-escaped \ in a single-quoted pattern e.g. '~\\\~' for matching a literal backslash?

If there's no answer to this question, I would continue to always use a double-escaped backslash \\\ in a single-quoted PHP regex pattern to match a literal \ because there's possibly nothing wrong with it.

Jonny 5
  • 12,171
  • 2
  • 25
  • 42
  • `\t` defines for TAB when you place that inside the double quote. So it will be appriciable if you use any other example where `t` will be not exactly after the backslash. – Sabuj Hassan Dec 28 '13 at 19:17
  • According to Google, `tripple` means "a horse's gait in which both left and then both right legs move together.". I think the correct spelling is `triple`. Not sure why you reverted that edit. – Amal Murali Dec 28 '13 at 19:20
  • @Amal Murali Oops, sorry! There are many `tripples` in the text :-) I won't touch the text for the next 15 min. – Jonny 5 Dec 28 '13 at 19:24
  • 1
    It's a little bit unclear what you're asking. Could you please rephrase the question? – Amal Murali Dec 28 '13 at 19:27
  • @Amal Murali Thanks! I hope it is better understandable now. – Jonny 5 Dec 28 '13 at 19:37
  • no, this still makes no sense as question. \ is a slash, inside a string you need \\ because you need to escape the slash. in a regex string, to match a literal \, you need an escape the slash, AND you need to escape the escaping-slash, so it becomes \\\\ (\\ for the escaper, and then \\ for the actual slash), just like the manual says. What's your actual question? – Mike 'Pomax' Kamermans Dec 28 '13 at 20:01
  • The question is about escape-level for matching a backslash in single-quoted regex patterns. `'~\\\~'` and `'~\\\\~'` both match a backslash whereas `"~\\\~"` (wrongly escaped in double quotes) and `"~\\\\~"` act differently. Escape level is differnt depending on the quote-type used for the pattern, similar for [matching a $ sign](http://stackoverflow.com/a/20809075/3110638). So why use one more backslash then necessary, when using a single-quoted pattern. – Jonny 5 Dec 28 '13 at 20:30
  • 1
    @Jonny5 Read this [answer](http://stackoverflow.com/a/18017821). The main "idea" is the following: PHP uses a regex engine, PCRE in this case. When you write a variable. It will pass through PHP, PHP will then pass it to the regex engine. What the regex engine needs is double backslash to match a backslash. How you do it, is up to you in PHP. I would go always for 4 backslashes, it's safe and prevents confusion. – HamZa Dec 28 '13 at 20:36

2 Answers2

11

A backslash character (\) is considered to be an escape character by both PHP's parser and the regular expression engine (PCRE). If you write a single backslash character, it will be considered as an escape character by PHP parser. If you write two backslashes, it will be interpreted as a literal backslash by PHP's parser. But when used in a regular expression, the regular expression engine picks it up as an escape character. To avoid this, you need to write four backslash characters, depending upon how you quote the pattern.

To understand the difference between the two types of quoting patterns, consider the following two var_dump() statements:

var_dump('~\\\~');
var_dump("~\\\\~");

Output:

string(4) "~\\~"
string(4) "~\\~"

The escape sequence \~ has no special meaning in PHP when it's used in a single-quoted string. Three backslashes do also work because the PHP parser doesn't know about the escape sequence \~. So \\ will become \ but \~ will remain as \~.

Which one should you use:

For clarity, I'd always use ~\\\\~ when I want to match a literal backslash. The other one works too, but I think ~\\\\~ is more clear.

Amal Murali
  • 75,622
  • 18
  • 128
  • 150
  • @Jonny5: Glad to have been of help :) – Amal Murali Dec 28 '13 at 21:02
  • As I understand it even better after reading the [single-quote section in PHP manual](http://www.php.net/manual/de/language.types.string.php#language.types.string.syntax.single) a backslash within single-quotes is only an escape-character before a single quote or a backslash. In any other case it is treated as a literal. '\\\' is impossible, as you would escape the closing `'`, but `'~\\\~'` would be one escaped backslash followed by a literal backslash as it is not an escape character before any other then back-slash or single-quote. – Jonny 5 Dec 28 '13 at 22:51
  • I asked for examples, where a double-escaped backslash would match something differnt than a triple escaped. This pattern cannot work, as it breaks the string and causes a parse error: `$pattern = '~[\\\\']~'`, whereas this would not match a backslash: `$pattern = '~[\\\']~';` but `$pattern = '~[\'\\\]~';` or `$pattern = "~[\\\\']~";` would work in this special case. – Jonny 5 Dec 28 '13 at 23:26
  • In the examples above I want to match a single-quote or a backslash using a [character-class](http://www.regular-expressions.info/charclass.html). `$pattern = '~[\\\\']~';` illustrates that a scenario exists where a triple-escaped backslash in a single-quoted pattern can't work. Understanding [how to escape](http://www.php.net/manual/en/language.types.string.php) now, makes it easy. – Jonny 5 Dec 29 '13 at 10:43
  • Sure, for me it cleared up the question and contains the solution, which is in understanding [how escaping in PHP-strings](http://www.php.net/manual/de/language.types.string.php) works that the regex parser get's properly escaped input. – Jonny 5 Dec 29 '13 at 10:54
3

There is no difference between the actual escaping of the slash in either single or double quoted strings in PHP - as long as you do it correct. The reason why you're getting a WONT WORK on your first example is, as pointed out in the comments, it expands \t to the tab meta character.

When you're using just three backslashes, the last one in your single quoted string will be interpreted as \~, which as far as single quoted strings go, will be left as it is (since it does not match a valid escape sequence). It is however just a coincidence that this will be parsed as you expect in this case, and not have some sort of side effect (i.e, \\\' would not behave the same way).

The reason for all the escaping is that the regular expression also needs backslashes escaped in certain situations, as they have special meaning there as well. This leads to the large number of backslashes after each other, such as \\\\ (which takes eight backslashes for the markdown parser, as it yet again adds another level of escaping).

Hopefully that clears it up, as you seem to be confused regarding the handling of backslashes in single/double quoted strings more than the behaviour in the regular expression itself (which will be the same regardless of " or ', as long as you escape things correctly).

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • 1
    Just a note: you can wrap those in backticks to avoid the additional escaping. :) – Amal Murali Dec 28 '13 at 20:48
  • @fiskfisk Thanks a lot for your answer. I composed my understanding from both of your answers and will do it right from now on. Just wanted to understand it. – Jonny 5 Dec 28 '13 at 21:02