2

I'm trying to write a very simple regular expression that matches any file name that doesn't end in .php. I came up with the following...

(.*?)(?!\.php)$

...however this matches all filenames. If someone could point me in the right direction I'd be very grateful.

Hannele
  • 9,301
  • 6
  • 48
  • 68
Simon Stevens
  • 424
  • 1
  • 5
  • 13
  • 4
    What language? Each differs in it's RegEx implementation, so it's important to know - php? perl? python? java? javascript? .net? – Oded Aug 04 '10 at 09:06
  • Is the regular expression written in PHP? The implementation language matters a great deal. – Yuval F Aug 04 '10 at 09:09
  • 4
    You have never voted for any of the answers people posted to your questions nor did you accept one of them. Consider doing so: that pretty much is the spirit of SO. – Bart Kiers Aug 04 '10 at 09:16
  • Bart, I'm actually quite conscious of that, however to be perfectly frank, I don't think any of my previous questions have had a reply that's helpful. In reply to the other gents, the language, is actually for Apache (ProxyPasMatch). – Simon Stevens Aug 04 '10 at 09:38
  • 1
    okay, fair enough. Although your first question (and your comment to a certain answer in there) seems to contradict your remark. – Bart Kiers Aug 04 '10 at 09:52

3 Answers3

6

Almost:

.*(?!\.php)....$

The last four dots make sure that there is something to look ahead at, when the look-ahead is checked.

The outer parentheses are unnecessary since you are interested in the entire match.

The reluctant .*? is unnecessary, since backtracking four steps is more efficient than checking the following condition with every step.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • many thanks Tomalak, that works perfectly for anybody looking at this at a later date, the final code we used was ProxyPassMatch ^(.*)\.((?!php)...)*$ http://127.0.0.1:12345/$1.$2 – Simon Stevens Aug 04 '10 at 09:37
  • 5
    Minor pedant: This won't match files with filenames less than 4 characters long. – Mark Byers Aug 04 '10 at 09:40
  • ProxyPassMatch ^(.*\.(?!php).*)$ http://127.0.0.1:12345/$1 - is the final version of what we used for future reference, it also fixes the filename length issue (tested with filename i.php) – Simon Stevens Aug 04 '10 at 09:56
  • 1
    @Mark Byers: You are right, I did not think of this case. Correct would be `^(.{1,3}|.*(?!\.php)....)$`. – Tomalak Aug 04 '10 at 10:31
4

Instead of using negative lookahead, sometimes it's easier to use the negation outside the regex at the hosting language level. In many languages, the boolean complement operator is the unary !.

So you can write something like this:

! str.hasMatch(/\.php$/)

Depending on language, you can also skip regex altogether and use something like (e.g. Java):

! str.endsWith(".php")

As for the problem with the original pattern itself:

(.*?)(?!\.php)$   // original pattern, doesn't work!

This matches, say, file.php, because the (.*?) can capture file.php, and looking ahead, you can't match \.php, but you can match a $, so altogether it's a match! You may want to use look behind, or if it's not supported, you can lookahead at the start of the string.

^(?!.*\.php$).*$  // negative lookahead, works

This will match all strings that does not end with ".php" using negative lookahead.

References

Related questions

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
  • An example pattern that's a bit more specific, capturing prefix part and extension part separately. http://www.rubular.com/r/xHizeFtbXb - has lots of room for improvement, but that would require more precise specification. – polygenelubricants Aug 04 '10 at 09:39
  • thanks poly, I should have mentioned in my OP that this is for Apache proxy stuff. Normally I would do the negation in code, but obviously we don't have the option in this case. Thanks for your reply in any case. – Simon Stevens Aug 04 '10 at 09:46
3

You are at the end of the string and looking ahead. What you want is a look behind instead:

(.*)$(?<!\.php)

Note that not all regular expression engines support lookbehind assertions.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 1
    I know you just copied it from the OP's regex, but that reluctant quantifier makes no sense. You're forcing the regex to evaluate the lookbehind once for each character in the string, when you know it only needs to be applied at the end. In fact, I would put the lookbehind *after* the anchor: `.*$(?<!\.php)` – Alan Moore Aug 04 '10 at 10:03
  • @Alan Moore: Yes, you're probably right about the performance of placing the lookahead after the anchor. Though I'm not sure if in some regular expression engines the `$` anchor could consume a trailing new line character which would give a different result. This is probably not going to be an issue when parsing URLs in Apache though. – Mark Byers Aug 04 '10 at 10:51