11

I"m using Mean.io and saw an regex in a modRewrite function:

app.use(modRewrite([

   '!^/api/.*|\\_getModules|\\.html|\\.js|\\.css|\\.mp4|\\.swf|\\.jp(e?)g|\\.png|\\.gif|\\.svg|\\.ico|\\.eot|\\.ttf|\\.woff|\\.pdf$ / [L]'

]));

I understand that they're trying to rewrite the url to be prettier by replacing any urls containing:

/api/, _getModules, .html, .js, ..., .pdf

However, I have been searching in order to understand the regex but still can't figure it out what is the !^ at the beginning of the line and $ at the end of the line. Could someone please extract the regex step by step?

lvarayut
  • 13,963
  • 17
  • 63
  • 87

1 Answers1

9

According to Apache mod_rewrite Introduction:

In mod_rewrite the ! character can be used before a regular expression to negate it. This is, a string will be considered to have matched only if it does not match the rest of the expression.

The ^ and $ are regex anchors that assert the position at the start and end of string respectively.

To understand the rest, you can go through the What does the regex mean post.

The regex itself is:

  • ^ - Assert the start of string position and...
  • /api/.* - Match literally /api/ and 0 or more characters other then newline
  • | - Or...
  • \\_getModules - Match _getModules
  • | - Or
  • \\.html - Match .html
  • |\\.js|\\.css|\\.mp4|\\.swf|\\.jp(e?)g|\\.png|\\.gif|\\.svg|\\.ico|\\.eot|\\.ttf|\\.woff| - Or these extensions (note it will match jpg and jpeg as ? means match 0 or 1 occurrence of preceding pattern)
  • \\.pdf$ - Match .pdf right at the end of the string ($).
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Well, the regex is really very simple: just a couple of alternatives (with `|` alternation operator) and *any-character-but-newline* matching with `.*` . The dot is escaped to match a literal dot (as `.` matches any character but a newline). – Wiktor Stribiżew Aug 07 '15 at 11:40
  • Thanks for your response. If I want to escape the dot, it should be `\.html`, shouldn't it? In this case, `\\.html`, this escapes the backslash instead of dot. – lvarayut Aug 07 '15 at 11:51
  • Mean.io is JavaScript-based. In JavaScript, you have to use ``\\`` to specify a literal ``\`` that is necessary when defining regex escape sequences (like `\s` matching whitespace, or `\w` matching alphanumeric and underscore symbols). In case the regex is defined with a string literal (i.e. `/\s\.jpg/`) there is no need to double the ``\`` symbol. – Wiktor Stribiżew Aug 07 '15 at 12:07
  • I understand what you described in the comment, however, in my example, I still didn't get it. Could you please extract the regex shown in my example piece my piece if you have time? BTW, I accepted your answer as well. – lvarayut Aug 07 '15 at 13:04
  • Yes, I will. Just note that if the escape sequence is not valid, ``\`` is treated as a literal. I will do the regex break down after bringing my son home. – Wiktor Stribiżew Aug 07 '15 at 13:07
  • Really great explanation! Thanks so much again. It might be a stupid question but why we use double back slashes `\\_getModules` instead of just `\_getModules`? IMHO, one back slash will also give us `_getModules` since it would escape the underscore character as well. – lvarayut Aug 08 '15 at 15:46
  • I think you are right, and this is in line with what I wrote before: if the escape sequence is unknown, the ``\`` is treated as a literal. – Wiktor Stribiżew Aug 08 '15 at 15:51
  • I got it. Thanks @stribizhev. – lvarayut Aug 08 '15 at 16:02