Extract URLs from text in PHP

Question

I have this text:

$string = "this is my friend's website http://example.com I think it is coll";

How can I extract the link into another variable?

I know it should be by using regular expression especially preg_match() but I don't know how?

possible duplicate of [Extract URL from string](http://stackoverflow.com/questions/4390556/extract-url-from-string) — Michael Berkowski, Jul 18 '13 at 01:18
@ Michael Berkowski how it will be duplicate the user asked on May 26 '09 at 14:13 but link mentioned by you asked on Dec 8 '10 at 17:44. May be the reverse may true. — gvgvgvijayan, Mar 18 '15 at 10:44

score 51 · Answer 1 · edited May 23 '17 at 12:10

51

Probably the safest way is using code snippets from WordPress. Download the latest one (currently 3.1.1) and see wp-includes/formatting.php. There's a function named make_clickable which has plain text for param and returns formatted string. You can grab codes for extracting URLs. It's pretty complex though.

This one line regex might be helpful.

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match);

But this regex still can't remove some malformed URLs (ex. http://google:ha.ckers.org ).

See also: How to mimic StackOverflow Auto-Link Behavior

edited May 23 '17 at 12:10

Community

1
1

answered Apr 17 '11 at 00:27

Nobu

9,965
4
40
47

4

I had a play around with the Wordpress formatting.php and using make_clickable is a nice idea but it ends up sucking in half of wordpress in dependencies. – Duncan Lock May 06 '11 at 06:59
Good one, to make sure the terminal part is not a weird character – Miguel Aug 12 '15 at 12:06
1

This doesn't identify url without http, like google.com – Coder anonymous Nov 04 '16 at 20:18
This regex will match http://google:ha.ckers.org "@https?:\/\/(www\.)?[-a-zA-Z0-9\@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()\@:%_\+.~#?&//=]*)@"; Dont remeber where i found it so can't give credit – Kyle Coots Mar 07 '20 at 04:19
https://stackoverflow.com/questions/23366790/php-find-all-links-in-the-text worked better for me than this (context WordPress). – aubreypwd Aug 04 '20 at 16:56
I tried using WP's wp_extract_urls function, but sometimes the URL query parameters of URLs in a string would have html entities, so had to use wp_extract_urls( html_entity_decode($text) when processing the comment text entry. Otherwise, I'd lose some of the query parameters. – Rick Hellewell Jan 24 '22 at 03:06

score 18 · Answer 2 · edited Jun 16 '14 at 20:27

I tried to do as Nobu said, using Wordpress, but to much dependencies to other WordPress functions I instead opted to use Nobu's regular expression for preg_match_all() and turned it into a function, using preg_replace_callback(); a function which now replaces all links in a text with clickable links. It uses anonymous functions so you'll need PHP 5.3 or you may rewrite the code to use an ordinary function instead.

<?php 

/**
 * Make clickable links from URLs in text.
 */

function make_clickable($text) {
    $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#';
    return preg_replace_callback($regex, function ($matches) {
        return "<a href=\'{$matches[0]}\'>{$matches[0]}</a>";
    }, $text);
}

Just a note: I've updated your answer to use a anonymous function as a callback instead of using `create_function()`. — Amal Murali, Jun 16 '14 at 20:28

score 14 · Answer 3 · edited Jun 16 '14 at 20:12

URLs have a quite complex definition — you must decide what you want to capture first. A simple example capturing anything starting with http:// and https:// could be:

preg_match_all('!https?://\S+!', $string, $matches);
$all_urls = $matches[0];

Note that this is very basic and could capture invalid URLs. I would recommend catching up on POSIX and PHP regular expressions for more complex things.

Avatar · Answer 4 · 2021-08-13T18:22:40.540

11

The code that worked for me (especially if you have several links in your $string):

$string = "this is my friend's website https://www.example.com I think it is cool, but this one is cooler https://www.stackoverflow.com :)";
$regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $string, $matches);
$urls = $matches[0];
// go over all links
foreach($urls as $url) 
{
    echo $url.'<br />';
}

Hope that helps others as well.

edited Aug 13 '21 at 18:22

answered Apr 12 '14 at 06:42

Avatar

14,622
9
119
198

i've tested all answer, this is only one will remove the html tab – hkguile Oct 05 '16 at 04:18

score 8 · Answer 5 · edited Oct 21 '18 at 05:40

If the text you extract the URLs from is user-submitted and you're going to display the result as links anywhere, you have to be very, VERY careful to avoid XSS vulnerabilities, most prominently "javascript:" protocol URLs, but also malformed URLs that might trick your regexp and/or the displaying browser into executing them as Javascript URLs. At the very least, you should accept only URLs that start with "http", "https" or "ftp".

There's also a blog entry by Jeff where he describes some other problems with extracting URLs.

score 5 · Answer 6 · answered May 26 '09 at 14:19

5

preg_match_all('/[a-z]+:\/\/\S+/', $string, $matches);

This is an easy way that'd work for a lot of cases, not all. All the matches are put in $matches. Note that this do not cover links in anchor elements (<a href=""...), but that wasn't in your example either.

answered May 26 '09 at 14:19

runfalk

1,996
1
17
20

1

-1: you've just created an XSS vulnerability, since it would also extract javascript: URLs. – Michael Borgwardt May 26 '09 at 14:26
It's not stated what he'd use it for, hence I don't account for that. He just wanted to get URLs into variables. – runfalk May 26 '09 at 14:29
2

@Michael: Finding javascript URLs is not yet a vulnerability; using them without any check is. Sometimes the presence and number of such URLs is useful information. I'd have chosen a different delimiter. :) – fuxia Mar 26 '10 at 16:29

score 4 · Answer 7 · answered Dec 24 '13 at 06:02

4

You could do like this..

<?php
$string = "this is my friend's website http://example.com I think it is coll";
echo explode(' ',strstr($string,'http://'))[0]; //"prints" http://example.com

answered Dec 24 '13 at 06:02

Shankar Narayana Damodaran

68,075
43
96
126

score 2 · Answer 8 · edited Oct 24 '18 at 13:44

You could try this to find the link and revise the link (add the href link).

$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";

// The Text you want to filter for urls
$text = "The text you want to filter goes here. http://example.com";

if(preg_match($reg_exUrl, $text, $url)) {

       echo preg_replace($reg_exUrl, "<a href="{$url[0]}">{$url[0]}</a> ", $text);

} else {

       echo "No url in the text";

}

refer here: http://php.net/manual/en/function.preg-match.php

score 2 · Answer 9 · answered Sep 01 '11 at 12:54

preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
                "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/",
                $var, &$matches);

$matches = $matches[1];
$list = array();

foreach($matches as $var)
{    
    print($var."<br>");
}

vstelmakh · Answer 10 · 2020-01-25T19:05:00.733

There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.

I created a PHP library that could deal with lots of edge cases: Url highlight.

Example:

<?php

use VStelmakh\UrlHighlight\UrlHighlight;

$urlHighlight = new UrlHighlight();
$urlHighlight->getUrls("this is my friend's website http://example.com I think it is coll");
// return: ['http://example.com']

For more details see readme. For covered url cases see test.

score 1 · Answer 11 · answered Mar 07 '20 at 04:35

Here is a function I use, can't remember where it came from but seems to do a pretty good job of finding links in the text. and making them links.

You can change the function to suit your needs. I just wanted to share this as I was looking around and remembered I had this in one of my helper libraries.

function make_links($str){

  $pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

  return preg_replace_callback("#$pattern#i", function($matches) {
    $input = $matches[0];
    $url = preg_match('!^https?://!i', $input) ? $input : "http://$input";
    return '<a href="' . $url . '" rel="nofollow" target="_blank">' . "$input</a>";
  }, $str);
}

Use:

$subject = 'this is a link http://google:ha.ckers.org maybe don't want to visit it?';
echo make_links($subject);

Output

this is a link <a href="http://google:ha.ckers.org" rel="nofollow" target="_blank">http://google:ha.ckers.org</a> maybe don't want to visit it?

score 1 · Answer 12 · answered Apr 22 '20 at 18:21

1

<?php
preg_match_all('/(href|src)[\s]?=[\s\"\']?+(.*?)[\s\"\']+.*?/', $webpage_content, $link_extracted);

preview

answered Apr 22 '20 at 18:21

Tesla

169
1
6

score 0 · Answer 13 · answered Sep 19 '16 at 13:05

This Regex works great for me and i have checked with all types of URL,

<?php
$string = "Thisregexfindurlhttp://www.rubular.com/r/bFHobduQ3n mixedwithstring";
preg_match_all('/(https?|ssh|ftp):\/\/[^\s"]+/', $string, $url);
$all_url = $url[0]; // Returns Array Of all Found URL's
$one_url = $url[0][0]; // Gives the First URL in Array of URL's
?>

Checked with lots of URL's can find here http://www.rubular.com/r/bFHobduQ3n

score 0 · Answer 14 · answered Aug 23 '17 at 08:08

public function find_links($post_content){
    $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
    // Check if there is a url in the text
    if(preg_match_all($reg_exUrl, $post_content, $urls)) {
        // make the urls hyper links,
        foreach($urls[0] as $url){
            $post_content = str_replace($url, '<a href="'.$url.'" rel="nofollow"> LINK </a>', $post_content);
        }
        //var_dump($post_content);die(); //uncomment to see result
        //return text with hyper links
        return $post_content;
    } else {
        // if no urls in the text just return the text
        return $post_content; 
    }
}

Extract URLs from text in PHP

14 Answers14

Linked

Related