Preg_match which only accept the website address with or with out www. and http://

Question

I have listing all the website address to my overview page. Before that I have to validate the address with the all possible cases.

After several research I found the below regex. But this is not given an exact result.

/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z0-9\.\/\?\:@\-_=#])*/

My possible test cases are:

    'test.com', 
    'http://www.google.com', 
    'www.google.com', 
    'https://google.com', 
    'https://www.google.com', 
    'testetst', 
    '<img src="/test/test" >',
    '<img src="/test/test.png" alt="page" title="page">'

I want only the domain name. Here I want first five result as true and remain should be false.

Possible duplicate of [how to get domain name from URL](http://stackoverflow.com/questions/569137/how-to-get-domain-name-from-url) — Ravi Hirani, Feb 22 '16 at 13:39
Check this link also. http://stackoverflow.com/questions/25703360/regular-expression-extract-subdomain-domain — Ravi Hirani, Feb 22 '16 at 13:41
You must use a DOMDocument to get the values from HTML tags (the last 2 examples). `testetst` - what is the actual requirement for matching this? the rest is OK, use [your regex](https://regex101.com/r/zM4bQ8/1). — Wiktor Stribiżew, Feb 22 '16 at 13:58

hherger · Answer 1 · 2016-02-23T06:18:20.967

Try this:

Code:

<?php
$input = 'test.com
http://www.google.com
www.google.com
https://google.com
https://www.google.com
testetst
img src="/test/test" >
<img src="/test/test.png" alt="page" title="page">';

echo '<h3>Input</h3><pre>'.htmlentities($input).'</pre><h3>Output</h3>';
preg_match_all('%(http[s]{0,1}://)*([A-Za-z0-9-]*?\.){0,1}([A-Za-z0-9-]*?\.[A-Za-z0-9-]*?)[\s]*(\r\n|\n\r|\r|\n|$)%', $input, $regs, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($regs[0]); $i++) {
    // $regs[3][$i] contains domain name
    echo $regs[3][$i] . '<br />';
}

Result:

Input:

test.com
    http://www.google.com
    www.google.com
    https://google.com
    https://www.google.com
    testetst
    img src="/test/test" >
    <img src="/test/test.png" alt="page" title="page">

Output:

test.com
google.com
google.com
google.com
google.com

The Regex in detail:

(                      Match the regular expression below and capture its match into backreference number 1
   http                   Match the characters “http” literally
   [s]                    Match the character “s”
      {0,1}                  Between zero and one times, as many times as possible, giving back as needed (greedy)
   ://                    Match the characters “://” literally
)*                     Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(                      Match the regular expression below and capture its match into backreference number 2
   [A-Za-z0-9-]           Match a single character present in the list below
                             A character in the range between “A” and “Z”
                             A character in the range between “a” and “z”
                             A character in the range between “0” and “9”
                             The character “-”
      *?                     Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
   \.                     Match the character “.” literally
){0,1}                 Between zero and one times, as many times as possible, giving back as needed (greedy)
(                      Match the regular expression below and capture its match into backreference number 3
   [A-Za-z0-9-]           Match a single character present in the list below
                             A character in the range between “A” and “Z”
                             A character in the range between “a” and “z”
                             A character in the range between “0” and “9”
                             The character “-”
      *?                     Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
   \.                     Match the character “.” literally
   [A-Za-z0-9-]           Match a single character present in the list below
                             A character in the range between “A” and “Z”
                             A character in the range between “a” and “z”
                             A character in the range between “0” and “9”
                             The character “-”
      *?                     Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
[\s]                   Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *                      Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
(                   Match the regular expression below and capture its match into backreference number 4
                      Match either the regular expression below (attempting the next alternative only if this one fails)
     \\r                  Match a carriage return character
     \\n                  Match a line feed character
  |                   Or match regular expression number 2 below (attempting the next alternative only if this one fails)
     \\n                  Match a line feed character
     \\r                  Match a carriage return character
  |                   Or match regular expression number 3 below (attempting the next alternative only if this one fails)
     \\r                  Match a carriage return character
  |                   Or match regular expression number 4 below (the entire group fails if this one fails to match)
     \\n                  Match a line feed character
  |                   Or match regular expression number 5 below (the entire group fails if this one fails to match)
     \$                   Assert position at the end of the string (or before the line break at the end of the string, if any)
)

This is not work for my input. Your code will result as per which you have mentioned as output. But if my input is `test.com` it will result empty. — Jagadeesh, Feb 23 '16 at 05:14
You were right. If one line only was input there was no match. I modified the RegEx. So, try it now. — hherger, Feb 23 '16 at 06:19

Preg_match which only accept the website address with or with out www. and http://

1 Answers1

Code:

Result:

The Regex in detail: