17

I am trying to match a string which does not contain a substring

My string always starts "http://www.domain.com/"

The substring I want to exclude from matches is ".a/" which comes after the string (a folder name in the domain name)

There will be characters in the string after the substring I want to exclude

For example:

"http://www.domain.com/.a/test.jpg" should not be matched

But "http://www.domain.com/test.jpg" should be

Tunaki
  • 132,869
  • 46
  • 340
  • 423
Joe Smalley
  • 273
  • 1
  • 2
  • 11

4 Answers4

29

Use a negative lookahead assertion as:

^http://www\.domain\.com/(?!\.a/).*$

Rubular Link

The part (?!\.a/) fails the match if the URL is immediately followed with a .a/ string.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
codaddict
  • 445,704
  • 82
  • 492
  • 529
9

My advise in such cases is not to construct overly complicated regexes whith negative lookahead assertions or such stuff.
Keep it simple and stupid!
Do 2 matches, one for the positives, and sort out later the negatives (or the other way around). Most of the time, the regexes become easier, if not trivial. And your program gets clearer.
For example, to extract all lines with foo, but not foobar, I use:

grep foo | grep -v foobar
Ingo
  • 36,037
  • 5
  • 53
  • 100
0

I would try with

^http:\/\/www\.domain\.com\/([^.]|\.[^a]).*$

You want to match your domain, plus everything that do not continue with a . and everything that do continue with a . but not a a. (Eventually you can add you / if needed after)

M'vy
  • 5,696
  • 2
  • 30
  • 43
  • 1
    This is fine - until *another* programmer is asked to extend it to also exclude .b, .c and .whatElsethemanagementdoesnotwant – Ingo Mar 25 '11 at 12:55
  • Yep... I get that @Ingo. BTW I forgot the \ before / – M'vy Mar 25 '11 at 12:59
0

If you don't use look ahead, but just simple regex, you can just say, if it matches your domain but doesn't match with a .a/

<?php

function foo($s) {

    $regexDomain = '{^http://www.domain.com/}';
    $regexDomainBadPath = '{^http://www.domain.com/\.a/}';

    return preg_match($regexDomain, $s) && !preg_match($regexDomainBadPath, $s);
}

var_dump(foo('http://www.domain.com/'));
var_dump(foo('http://www.otherdomain.com/'));

var_dump(foo('http://www.domain.com/hello'));
var_dump(foo('http://www.domain.com/hello.html'));
var_dump(foo('http://www.domain.com/.a'));
var_dump(foo('http://www.domain.com/.a/hello'));
var_dump(foo('http://www.domain.com/.b/hello'));
var_dump(foo('http://www.domain.com/da/hello'));

?>

note that http://www.domain.com/.a will pass the test, because it doesn't end with /.

nonopolarity
  • 146,324
  • 131
  • 460
  • 740