1

I'm using some regex (that works), but don't actually understand what it's doing. Being a scientist, I like to follow everything that I do!

The code is from this SO answer: https://stackoverflow.com/a/118886/889604

$mtime = filemtime($_SERVER['DOCUMENT_ROOT'] . $file);
return preg_replace('{\\.([^./]+)$}', ".$mtime.\$1", $file);

This code takes a file name (e.g. /files/style.css), and adds in the file's mtime (e.g. /files/styles.1256788634.css).

So, I get that ^ and $ symbols are the beginning and end of the string to match, and that the ./ matches any character any number of times (because of the +), but how does the mtime end up inbetween the file name and the extension?

Community
  • 1
  • 1
ChrisW
  • 4,970
  • 7
  • 55
  • 92
  • `[^./]` doesn't match any character, rather it matches any character that is not ( `[^...]` ) a dot or a forward slash. Dot inside of a character class ( `[.]` ) is just a dot. – Kenneth K. Mar 23 '13 at 15:37
  • didn't know that curly braces can be regex delimiters. Apparently this is the only case where the starting and the closing delimiter can be different (any other non-alphanumeric, non-backslash, non-whitespace character can be a delimiter, but it must be repeated to close the regex) – Walter Tross Mar 23 '13 at 22:44

3 Answers3

1

The { and } are used as delimiters and do not take part of the search pattern. \\.is describing a dot. The dot has to be escaped (thus the backslashes) because a un-escaped dot would describe the presence of any single character. The round brackets ( ... ) define a group that can be accessed via $1 in the second preg_replace parameter. The content of this group consists of [^./]+, which means

a positive quantity of (defined via the + after the set) any single character that is not (^ in the beginning of a set means not) a dot . or a slash /.

The round brackets are followed by a $ which describes the end of the line.

The expression will match the file extension of the path, like .css, while css will be the value of the group $1. Therefore, .css will be replaced with .$mtime.css where $mtime will be the value of the php variable.

Appleshell
  • 7,088
  • 6
  • 47
  • 96
  • So how is *only* the file extension matched? If I had a string like `/files.test/styles.gb.css` would the regex I'm using still work or is it looking for the only `.` in the string? – ChrisW Mar 23 '13 at 15:47
  • 1
    @ChrisW Because you have *anchored* the pattern using the `$`, the `.XXX` or whatever can only come at the end of the string in order for the regex to declare success. Without the `$` you would be telling the regex engine that the "extension" can occur anywhere within the target string; with the `$`, it can only come at the end of the string. – Kenneth K. Mar 23 '13 at 15:48
  • 1
    Yes - the expression only matches the part where it **exactly** matches. This expression matches the following part: `.` followed by a number of characters that are **not** dots or slashes, followed by the end of the line. – Appleshell Mar 23 '13 at 15:51
  • @KennethK - thanks, starting to get there I think! Final question (I think) - why does there need to be a double backslash? I thought a single one would escape a dot? – ChrisW Mar 23 '13 at 15:51
  • 1
    @ChrisW Partly due to PHP; partly due to regex. In PHP, backslashes inside of quoted strings must be escaped (i.e. preceded by a backslash) in order to prevent a compilation error. In regex, certain characters are special characters (like `^` and `$`). In order to have the regex engine treat special characters as not special you have to escape them (by using a preceding backslash). Since you want a literal backslash in your string for the regex, in PHP it must be written as a double backslash. – Kenneth K. Mar 23 '13 at 15:53
0

The mtime ends up in the output due to PHP's string interpolation rules, that cause variable referenced inside of a double-quoted string to output the variable's value rather than the literal text of the variable name. See http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.double (last sentence of section) for more information.

Kenneth K.
  • 2,987
  • 1
  • 23
  • 30
  • Oops - obviously bad phrasing on my part - I know how to reference vars in php, I just don't understand regex! – ChrisW Mar 23 '13 at 15:41
0

All the regex pattern is doing is replacing the entire file extension .css and storing the extension without the period css into a capture by using the parentheses ([^./]+). It then replaces the extension .css from the $file with a period, the value of $mtime, another period, and the captured extension from the regex $1.

And a note: The ^ does NOT mean the beginning of the string. When it is in a group like [^./] it is saying "match any character EXCEPT these ones"

I hope all that makes sense.

Edit: It only matches the .css part of the $file because of the \\. which tells the regex to start at the first period found in the $file and then move on to do the capture. It must be escaped with the \\. because it would otherwise act like a regex period which matches any character.

edhurtig
  • 2,331
  • 1
  • 25
  • 26
  • You might adjust your answer to indicate what causes the pattern to match only the extension. The `\.([^./]+)` in and of itself is not sufficient. – Kenneth K. Mar 23 '13 at 15:43
  • Oh right! That starts to explain a lot - but how does it know it only to match the .css part of the filename? – ChrisW Mar 23 '13 at 15:44