5

I know the question title looks very repetitive. But some of the solution i did not find here.

I need to find urls form text string:

$pattern = '`.*?((http|https)://[\w#$&+,\/:;=?@.-]+)[^\w#$&+,\/:;=?@.-]*?`i';

    if (preg_match_all($pattern,$url_string,$matches)) {
        print_r($matches[1]);
    }

using this pattern i was able to find urls with http:// and https:// which is okey. But i have user input where people add url like www.domain.com even domain.com

So, i need to validate the string first where i can replace www.domain.com domain.com with common protocol http:// before them. Or i need to comeup with more good pattern?

I am not good with regex and don't know what to do.

My idea is first finding the urls with http:// and https:// the put them in an array then replace these url with space(" ") in the text string then use other patterns for it. But i am not sure what pattern to use.

I am using this $url_string = preg_replace($pattern, ' ', $url_string ); but that removes if any www.domain.com or domain.com url between two valid url with http:// or https://

If you can help that will be great.

To make things more clear:

i need a pattern or some other method where i can find all urls in a text sting. the example of url are:

  1. domain.com
  2. www.domain.com
  3. http://www.domain.com
  4. http://domain.com
  5. https://www.domain.com
  6. https://domain.com

thanks! 5.

drudge
  • 35,471
  • 7
  • 34
  • 45
Sisir
  • 2,668
  • 6
  • 47
  • 82
  • Are you validating user input from a form with a URL field? Or are you scraping a page/block of text to generate a list of URLs found inside of it? A complete example of the "text string" you are trying to parse might be helpful. – baraboom May 25 '11 at 16:54
  • @baraboom: yes, from user input textbox. where people may input like this twitter : twitter.com/user facebook: http://facebook.com etc.. – Sisir May 25 '11 at 17:26

2 Answers2

3
$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);
HamZa
  • 14,671
  • 11
  • 54
  • 75
Adrian B
  • 1,490
  • 1
  • 19
  • 31
  • Thanks! almost worked!! Still need to find the pattern `domain.com` – Sisir May 25 '11 at 17:37
  • 1
    @Sisir replace the `{1}` with a `?` to make the http:// or www optional. – Jonathan Kuhn May 25 '11 at 21:00
  • This does not work for me. I receive an empty results. `$pattern = '#(www\.|https?:\/\/){?}[a-zA-Z0-9]{2,254}\.[a-zA-Z0-9]{2,4}(\S*)#i'; $count = preg_match_all($pattern, 'http://www.Imaurl.com', $matches, PREG_PATTERN_ORDER);` And there is no error from `preg_last_error()` – Shane Jul 31 '13 at 20:41
  • Copying and pasting this into an interactive PHP shell I also get blank results. Also, the `{2,254}` limit doesn't support domains like `t.co` which are gaining popularity these days. Tried to edit the answer, but an edit must be >6 characters apparently :-( Oh, and I don't think this will match domains like `me-too.com`. – chmac Sep 26 '13 at 11:26
0

I'm not sure if I've understood what you need correctly, but can you use something like this:

preg_match('#^.+?://#', $url);

to find if there is a protocol specified on the string, and if not just append http://

HamZa
  • 14,671
  • 11
  • 54
  • 75
tjm
  • 7,500
  • 2
  • 32
  • 53