I need a regex that will pull a URL from a text document

Question

The urls I'm trying to pull are all in the format of www.domain.com. I want to pull them from text documents with a simple regex. It only needs to match www.domain.com, and not other url variations.

What is the simplest regex to use with preg_match_all()?

Check out this post http://stackoverflow.com/questions/399250/going-where-php-parse-url-doesnt-parsing-only-the-domain/399316#399316 — Sean Barlow, Nov 29 '11 at 05:33

Teneff · Accepted Answer · 2011-11-29T05:40:37.167

2

/w{3}\.\w{2,}\.\w{3}/

this will match www. any word with more than two letters dot + 3 letters

to match domains with hyphen or uppercase letters:

/w{3}\.[\w\-]{2,}\.\w{3}/i

edited Nov 29 '11 at 05:40

answered Nov 29 '11 at 05:33

Teneff

30,564
13
72
103

2

This regex would not find something like www.do-main.com – Godwin Nov 29 '11 at 05:35
1

It would probably help if it found this format as well... although this does satisfy my original request. – T. Brian Jones Nov 29 '11 at 05:38

score 1 · Answer 2 · edited Feb 03 '12 at 03:46

1

I don't do a whole lot with PHP, but the regex would be something like:

w{3}.([a-zA-Z0-9\~\!\@\#\$\%\^\&amp;\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?

will return all domain names that start with "www.". It will ignore the protocol part of the tag (e.g. http://)

edited Feb 03 '12 at 03:46

James Khoury

21,330
4
34
65

answered Nov 29 '11 at 05:40

Greg

3,442
3
29
50

score 0 · Answer 3 · answered Nov 29 '11 at 05:32

preg_match_all('%((mailto\\:|(news|(ht|f)tp(s?))\\://){1}\\S+)%m', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    // $result[0][$i];
}

You can also use a class that I wrote, https://github.com/homer6/altumo/blob/master/source/php/String/Url.php if you want to easily pull parts of the url. See the unit test in the same directory for usage.

If you're looking for a good program to tweak your regex patterns, I highly recommend regexbuddy.

Hope that helps...

I need a regex that will pull a URL from a text document

3 Answers3