3

How to make sure that the domain name match those 3 simple criterias :

  • Ends with .com / .net

Must not start with

  • http:// or https://
  • http://www. or https://www.

I've managed to understand this part of the regex which correspond with the first criteria :

/.*(\.com|\.net)$/

But i have no idea how to achieve the 2 others conditions to make an unique regex.

Thanks for your help.

gpojd
  • 22,558
  • 8
  • 42
  • 71
sf_tristanb
  • 8,725
  • 17
  • 74
  • 118
  • 5
    If you need to be sure that a string will not contain the first two points, why don't you simply use str_replace and then test for the first criteria? I think it will be more easy and surely more efficient. – Aurelio De Rosa Oct 12 '11 at 14:53
  • 1
    make the regex match the http:// etc. and then get the ! of the bool returned – Laurence Burke Oct 12 '11 at 14:54
  • Well, yeah, you're right, it'll be much easy. i could work with that. :-) – sf_tristanb Oct 12 '11 at 14:55
  • Your formulation of the problem is much too simplistic. This cannot be solved by regex in the general case. See http://stackoverflow.com/questions/1201194/php-getting-domain-name-from-subdomain/1201210#comment26755649_1201210 – tripleee Aug 15 '13 at 08:36

5 Answers5

5

"Not starting" with a pattern is a bit tricky.

The clearest way of doing it is two separate regexes, one to match what you want and one not matching what you don't want.

But you can do this in one with a negative look-ahead:

/^(?!https?:\/\/(www\.)?).*(\.com|\.net)$/

Edit: correct the assertion as pointed out by ridgerunner

Colin Fine
  • 3,334
  • 1
  • 20
  • 32
  • Won't work. From the beginning of the string, `^` you want to use negative _lookahead_ NOT negative lookbehind. The expression needed here is: `/^(?!https?:\/\/(www\.)?).*(\.com|\.net)$/` – ridgerunner Oct 13 '11 at 15:25
  • @Ridgerunner: You're right. I've corrected it. Tricky things, these lookarounds. – Colin Fine Oct 14 '11 at 15:22
3

A regex solution is easy. Simply assert a negative lookahead at the start of the string like so: (With comments...)

if (preg_match('%
    # Match non-http ,com or .net domain.
    ^             # Anchor to start of string.
    (?!           # Assert that this URL is NOT...
      https?://   # HTTP or HTTPS scheme with
      (?:www\.)?  # optional www. subdomain.
    )             # End negative lookahead.
    .*            # Match up to TLD.
    \.            # Last literal dot before TLD.
    (?:           # Group for TLD alternatives.
      net         # Either .net
    | com         # or .com.
    )             # End group of TLD alts.
    $             # Anchor to end of string.
    %xi', $text)) {
    // It matches.
} else {
    // It doesn't match.
}

Note that since: http://www. is a subset of: http://, the expression for the optional www. is not necessary. Here is a shorter version:

if (preg_match('%^(?!https?://).*\.(?:net|com)$%i', $text)) {
    // It matches.
} else {
    // It doesn't match.
}

Simple regex to the rescue!

ridgerunner
  • 33,777
  • 5
  • 57
  • 69
2

If you need to be sure that a string will not contain the first two points, why don't you simply use str_replace and then test for the first criteria? I think it will be more easy and surely more efficient.

Aurelio De Rosa
  • 21,856
  • 8
  • 48
  • 71
0
^[a-zA-Z\.]+\.(com|net)$

does this work?

well if I understood you right, you want to check a list of String, and find out which are domain names. e.g.

http://www.a.b (F)
a.com (T)
b.net  (T)
https://google.com (F)
Kent
  • 189,393
  • 32
  • 233
  • 301
0

Try this:

if(preg_match('/^(?:http://|https://)(?:[w]{3}|)/i', $subject))
{
  echo 'Fail';
}
else
{
  if(preg_match('/(?:.*(\.com|\.net))$/i', $subject))
  {
    echo 'Pass';
  }
  else
  {
    echo 'Fail';
  }
}
Biotox
  • 1,563
  • 10
  • 15