3

In the top-voted answer to this fantastic question, the following regular expression is used in a preg_replace call (from the answer's auto_version function):

'{\\.([^./]+)$}'

The end goal of this regular expression is to extract the file's extension from the given filename. However, I'm confused about why the very beginning of this regular expression works. Namely:

Why does \\. match the same way as \. in a regex?

Shouldn't the former match (a) one literal backslash, followed by (b) any character, while the second matches one literal period? The rules for single quoted strings state that \\ yields a literal backslash.

Consider this simple example:

$regex1 = '{\.([^./]+)$}';  // Variant 1 (one backslash)
$regex2 = '{\\.([^./]+)$}'; // Variant 2 (two backslashes)

$subject1 = '/css/foobar.css';   // Regular path
$subject2 = '/css/foobar\\.css'; // Literal backslash before period

echo "<pre>\n";
echo "Subject 1: $subject1\n";
echo "Subject 2: $subject2\n\n";

echo "Regex 1: $regex1\n";
echo "Regex 2: $regex2\n\n";

// Test Variant 1
echo preg_replace($regex1, "-test.\$1", $subject1) . "\n";
echo preg_replace($regex1, "-test.\$1", $subject2) . "\n\n";

// Test Variant 2
echo preg_replace($regex2, "-test.\$1", $subject1) . "\n";
echo preg_replace($regex2, "-test.\$1", $subject2) . "\n\n";
echo "</pre>\n";

The output is:

Subject 1: /css/foobar.css
Subject 2: /css/foobar\.css

Regex 1: {\.([^./]+)$}  <-- Output matches regex 2
Regex 2: {\.([^./]+)$}  <-- Output matches regex 1

/css/foobar-test.css
/css/foobar\-test.css

/css/foobar-test.css
/css/foobar\-test.css

Long story short: why should \\. yield the same matched results in a preg_replace call as \.?

Community
  • 1
  • 1
Jonah Bishop
  • 12,279
  • 6
  • 49
  • 74

2 Answers2

11

Consider that there is double escaping going on: PHP sees \\. and says "OK, this is really \.". Then the regex engine sees \. and says "OK, this means a literal dot".

If you remove the first backslash, PHP sees \. and says "this is a backslash followed by a random character -- not a single quote or a backslash as per the spec -- so it remains \.". The regex engine again sees \. and gives the same result as above.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • So if the end goal is to match a literal backslash, I guess you have to take into account the multiple levels of escaping that could occur? Something like `{\\\.}`, yielding `\\.`? – Jonah Bishop Jan 23 '13 at 16:13
  • @JonahBishop: Exactly. Again, either three or four backslashes in a PHP string will end up matching a literal backslash in the regex (unless there are three followed by a single quote, but you get the picture). – Jon Jan 23 '13 at 16:15
  • The levels of indirection are very interesting here. I can see why test cases for this kind of thing are a good idea. Thanks for the excellent answer. – Jonah Bishop Jan 23 '13 at 16:16
0

An addition to the perfectly correct answer by Jon:

Please consider the usage of the different kind of quotes (" vs '). If you use ' you cannot include control characters (like a new line). With " this is possible, by using the special key combinations \? where ? can be different things (like \n, \t, etc..). So, if you want to have a real \ in your double-quoted string, you need to escape the backslash by using \\. Please note, that this is not necessary, when using single quotes.

apfelbox
  • 2,625
  • 2
  • 24
  • 26
  • Uh, in Perl, and I would presume PHP, \\ and \' are recognized as \ and ' within '-delimited strings. Can someone give a definitive answer for PHP? – Phil Perry Jan 13 '14 at 23:48