Replace URLs in text with HTML links

Question

Here is a design though: For example is I put a link such as

http://example.com

in textarea. How do I get PHP to detect it’s a http:// link and then print it as

print "<a href='http://www.example.com'>http://www.example.com</a>";

I remember doing something like this before however, it was not fool proof it kept breaking for complex links.

Another good idea would be if you have a link such as

http://example.com/test.php?val1=bla&val2blablabla%20bla%20bla.bl

fix it so it does

print "<a href='http://example.com/test.php?val1=bla&val2=bla%20bla%20bla.bla'>";
print "http://example.com/test.php";
print "</a>";

This one is just an after thought.. stackoverflow could also probably use this as well :D

Any Ideas

ooo i see stackoverflow already do the first part.. post the code, u know you want to :D — Angel.King.47, Jul 27 '09 at 13:30

Søren Løvborg · Accepted Answer · 2019-09-06T18:11:57.930

123

Let's look at the requirements. You have some user-supplied plain text, which you want to display with hyperlinked URLs.

The "http://" protocol prefix should be optional.
Both domains and IP addresses should be accepted.
Any valid top-level domain should be accepted, e.g. .aero and .xn--jxalpdlp.
Port numbers should be allowed.
URLs must be allowed in normal sentence contexts. For instance, in "Visit stackoverflow.com.", the final period is not part of the URL.
You probably want to allow "https://" URLs as well, and perhaps others as well.
As always when displaying user supplied text in HTML, you want to prevent cross-site scripting (XSS). Also, you'll want ampersands in URLs to be correctly escaped as &.
You probably don't need support for IPv6 addresses.
Edit: As noted in the comments, support for email-adresses is definitely a plus.
Edit: Only plain text input is to be supported – HTML tags in the input should not be honoured. (The Bitbucket version supports HTML input.)

Edit: Check out GitHub for the latest version, with support for email addresses, authenticated URLs, URLs in quotes and parentheses, HTML input, as well as an updated TLD list.

Here's my take:

<?php
$text = <<<EOD
Here are some URLs:
stackoverflow.com/questions/1188129/pregreplace-to-detect-html-php
Here's the answer: http://www.google.com/search?rls=en&q=42&ie=utf-8&oe=utf-8&hl=en. What was the question?
A quick look at http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax is helpful.
There is no place like 127.0.0.1! Except maybe http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm?
Ports: 192.168.0.1:8080, https://example.net:1234/.
Beware of Greeks bringing internationalized top-level domains: xn--hxajbheg2az3al.xn--jxalpdlp.
And remember.Nobody is perfect.

<script>alert('Remember kids: Say no to XSS-attacks! Always HTML escape untrusted input!');</script>
EOD;

$rexProtocol = '(https?://)?';
$rexDomain   = '((?:[-a-zA-Z0-9]{1,63}\.)+[-a-zA-Z0-9]{2,63}|(?:[0-9]{1,3}\.){3}[0-9]{1,3})';
$rexPort     = '(:[0-9]{1,5})?';
$rexPath     = '(/[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]*?)?';
$rexQuery    = '(\?[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';
$rexFragment = '(#[!$-/0-9:;=@_\':;!a-zA-Z\x7f-\xff]+?)?';

// Solution 1:

function callback($match)
{
    // Prepend http:// if no protocol specified
    $completeUrl = $match[1] ? $match[0] : "http://{$match[0]}";

    return '<a href="' . $completeUrl . '">'
        . $match[2] . $match[3] . $match[4] . '</a>';
}

print "<pre>";
print preg_replace_callback("&\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))&",
    'callback', htmlspecialchars($text));
print "</pre>";

To properly escape < and & characters, I throw the whole text through htmlspecialchars before processing. This is not ideal, as the html escaping can cause misdetection of URL boundaries.
As demonstrated by the "And remember.Nobody is perfect." line (in which remember.Nobody is treated as an URL, because of the missing space), further checking on valid top-level domains might be in order.

Edit: The following code fixes the above two problems, but is quite a bit more verbose since I'm more or less re-implementing preg_replace_callback using preg_match.

// Solution 2:

$validTlds = array_fill_keys(explode(" ", ".aero .asia .biz .cat .com .coop .edu .gov .info .int .jobs .mil .mobi .museum .name .net .org .pro .tel .travel .ac .ad .ae .af .ag .ai .al .am .an .ao .aq .ar .as .at .au .aw .ax .az .ba .bb .bd .be .bf .bg .bh .bi .bj .bm .bn .bo .br .bs .bt .bv .bw .by .bz .ca .cc .cd .cf .cg .ch .ci .ck .cl .cm .cn .co .cr .cu .cv .cx .cy .cz .de .dj .dk .dm .do .dz .ec .ee .eg .er .es .et .eu .fi .fj .fk .fm .fo .fr .ga .gb .gd .ge .gf .gg .gh .gi .gl .gm .gn .gp .gq .gr .gs .gt .gu .gw .gy .hk .hm .hn .hr .ht .hu .id .ie .il .im .in .io .iq .ir .is .it .je .jm .jo .jp .ke .kg .kh .ki .km .kn .kp .kr .kw .ky .kz .la .lb .lc .li .lk .lr .ls .lt .lu .lv .ly .ma .mc .md .me .mg .mh .mk .ml .mm .mn .mo .mp .mq .mr .ms .mt .mu .mv .mw .mx .my .mz .na .nc .ne .nf .ng .ni .nl .no .np .nr .nu .nz .om .pa .pe .pf .pg .ph .pk .pl .pm .pn .pr .ps .pt .pw .py .qa .re .ro .rs .ru .rw .sa .sb .sc .sd .se .sg .sh .si .sj .sk .sl .sm .sn .so .sr .st .su .sv .sy .sz .tc .td .tf .tg .th .tj .tk .tl .tm .tn .to .tp .tr .tt .tv .tw .tz .ua .ug .uk .us .uy .uz .va .vc .ve .vg .vi .vn .vu .wf .ws .ye .yt .yu .za .zm .zw .xn--0zwm56d .xn--11b5bs3a9aj6g .xn--80akhbyknj4f .xn--9t4b11yi5a .xn--deba0ad .xn--g6w251d .xn--hgbk6aj7f53bba .xn--hlcj6aya9esc7a .xn--jxalpdlp .xn--kgbechtv .xn--zckzah .arpa"), true);

$position = 0;
while (preg_match("{\\b$rexProtocol$rexDomain$rexPort$rexPath$rexQuery$rexFragment(?=[?.!,;:\"]?(\s|$))}", $text, &$match, PREG_OFFSET_CAPTURE, $position))
{
    list($url, $urlPosition) = $match[0];

    // Print the text leading up to the URL.
    print(htmlspecialchars(substr($text, $position, $urlPosition - $position)));

    $domain = $match[2][0];
    $port   = $match[3][0];
    $path   = $match[4][0];

    // Check if the TLD is valid - or that $domain is an IP address.
    $tld = strtolower(strrchr($domain, '.'));
    if (preg_match('{\.[0-9]{1,3}}', $tld) || isset($validTlds[$tld]))
    {
        // Prepend http:// if no protocol specified
        $completeUrl = $match[1][0] ? $url : "http://$url";

        // Print the hyperlink.
        printf('<a href="%s">%s</a>', htmlspecialchars($completeUrl), htmlspecialchars("$domain$port$path"));
    }
    else
    {
        // Not a valid URL.
        print(htmlspecialchars($url));
    }

    // Continue text parsing from after the URL.
    $position = $urlPosition + strlen($url);
}

// Print the remainder of the text.
print(htmlspecialchars(substr($text, $position)));

edited Sep 06 '19 at 18:11

answered Jul 27 '09 at 14:55

Søren Løvborg

8,354
2
47
40

I will try and test your implementation my friend.. And then post your answer to be correct if it works. Will take some time though, cause im not at home. Ps Thanks for releasing it into the public domain :D – Angel.King.47 Jul 27 '09 at 18:19
also P.S +1 for the effort :D – Angel.King.47 Jul 27 '09 at 18:20
Tried and tested.. Gj, I used Solution 2 – Angel.King.47 Jul 27 '09 at 21:05
Very helpful. But $rexProtocol is not case sensitive e.g HTTps will not be detected. Can you please tell me how i can achieve it. I want https? and ftp (case insensitive). Thanks in advance. – Rahul Sep 12 '11 at 13:09
2

@Rahul: Simply make the regular expression [case insensitive](http://php.net/manual/en/regexp.reference.delimiters.php): In the call to `preg_match`, add an `i` after the final `}` in the regular expression. – Søren Løvborg Sep 12 '11 at 14:20
@Søren : Already got fixed. Thanks for the reply. By mistake i was using /i :). – Rahul Sep 12 '11 at 14:51
Few comments: First, I suggest hosting your code snippets to github.com so we can follow updates on this snippet, suggest changes and report bugs in a structured way. – bart Oct 09 '11 at 18:20
Second, htmlEscapeAndLinkUrls() escapes already formatted urls and makes them unclickable. I understand this is a security measurement, but mind that our applications may already have this security measurement in place. – bart Oct 09 '11 at 18:23
Third: There is a problem with e-mail addresses. The domain part of the e-mail address is converted, but not the e-mail address as a whole. – bart Oct 09 '11 at 18:24
3

I suggest doing a detection whether the url is enclosed by . If so, do nothing. – bart Oct 09 '11 at 18:33
@bart: 1) Thanks for the suggestion. I've opted for [BitBucket](https://bitbucket.org/kwi/urllinker) for various reasons. 2) True, although the question concerned plain text input. The issue of allowing markup in input (whether it be HTML, BBcode or whatever) is much more complex. 3) Good point. The code should handle that; I'll see if I can write something up. 4) Could work. You'd have to prevent users from specifying any other attributes (such as `style`) on the `` tag, as well as prevent non-standard protocols such as `javascript:`. – Søren Løvborg Oct 10 '11 at 13:16
Thanks for the quick replies Søren! – bart Oct 10 '11 at 22:07
An updated version with support for email adresses can be found on [Bitbucket](https://bitbucket.org/kwi/urllinker/src). – Søren Løvborg Nov 02 '11 at 21:06
This does not work if the plain-text url is wrapped inside some form of parentheses, e.g. [hxxp://www.google.com] won't work, so I changed line 26 to this `$rexUrlLinker = "{\\b$rexUrl(?=[?.!,;:\"\'\|\[\]\{\}]?(\s|$))}";` – Klemen Tusar Sep 05 '12 at 09:53
Would it be easy to extend it when a link is already example.com that it will be ignored? – Laoneo Oct 19 '12 at 14:17
@Laoneo: Unfortunately, safely allowing markup is a rather complex issue (and, I think, outside the scope of this question). In your specific example, you could just use [`strip-tags`](http://php.net/strip-tags) to remove all tags, then use my function to reinstate it, but that obviously only works when the link text matches the link target. – Søren Løvborg Oct 19 '12 at 14:45
I have tried the second solution with an url like this "https://mail.google.com/mail/u/0/#starred?compose=141d598cd6e13025" and it does not work properly, any idea? – M4rk Nov 16 '13 at 18:36
@rodi: It's now fixed [on Bitbucket](https://bitbucket.org/kwi/urllinker/commits/bf412ba965f25768387eb5112e59bd33ead5fb5b). – Søren Løvborg Nov 17 '13 at 12:39
Thanks for this. It doesn't seem to work with a url like this: http://saloona.co.il/blog/%D7%95%D7%99%D7%93%D7%90%D7%95-%D7%9B%D7%9C-%D7%94%D7%93%D7%A8%D7%9B%D7%99%D7%9D-%D7%9C%D7%A7%D7%A9%D7%99%D7%A8%D7%AA-%D7%9E%D7%98%D7%A4%D7%97%D7%AA-%D7%A6%D7%95%D7%95%D7%90%D7%A8-%D7%90%D7%A8%D7%95/?ref=popular – Guy Dec 10 '13 at 16:15
1

@Guy: _That_ is not a URL. :) Rather, it is an [IRI](http://en.wikipedia.org/wiki/Internationalized_Resource_Identifier). But feel free to create [an enhancement request on Bitbucket](https://bitbucket.org/kwi/urllinker/issues), and I may look into whether it's a feasible to support. – Søren Løvborg Dec 11 '13 at 18:16
Hi, @SørenLøvborg! I made some upgrades to your code and submitted a Pull Request - hope it works well for you! :) https://bitbucket.org/rinogo/urllinker/overview – rinogo Sep 06 '14 at 03:50
call time pass-by-reference was removed since PHP5.4 and so the third argument to `preg_match` in solution 2 above should be simply `$match` and not `&match`. – Cedric Ipkiss Jan 15 '15 at 22:31
This helped me a great deal. I also added `rel="noFollow"` within the anchor tags. – Cedric Ipkiss Jan 15 '15 at 22:38
what are two problem in solution one ? should not I use it ? (solution one) I tested it and worked correctly. – Shafizadeh Aug 11 '15 at 10:06
3

@Sajad: The two problems are listed just above the last "Edit", most importantly that `htmlspecialchars` can turn a valid URL into an invalid one. And you should not use either version shown here; use [the up-to-date version on Bitbucket](https://bitbucket.org/kwi/urllinker/). The code here just demonstrates the general idea, while the Bitbucket version contains numerous bugfixes. – Søren Løvborg Aug 11 '15 at 13:05
@SørenLøvborg Your regex is wrong, at least for German date notation, as it turns e. g. 20.07.1963 into a clickable link due to the domain portion of the regex. Reluctant reviewers prevented me from editing your post, so please do so yourself by either adding a that note or, even better, fix your regex regarding that issue ASAP. TIA. – Yoda Mar 14 '19 at 08:39

Raheel Hasan · Answer 2 · 2015-04-20T10:13:09.850

17

You guyz are talking way to advance and complex stuff which is good for some situation, but mostly we need a simple careless solution. How about simply this?

preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1</a> ', $text_msg);

Just try it and let me know what crazy url it doesnt satisfy.

edited Apr 20 '15 at 10:13

answered Apr 20 '15 at 09:57

Raheel Hasan

5,753
4
39
70

Yes... but... why not add the code to make it cut/pasteable?!?! $text_msg= preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '$1 ', $text_msg); – pperrin Dec 10 '15 at 23:47
3

Good solution, but if you have HTML in the string, then you might want to replace `\S` with `[^<]` – May 14 '17 at 20:09
`[s]` is too verbose. `{0,1}` is too verbose. `\:` is too verbose. `{0,}` is too verbose. `ms` is nonsensical. I do not endorse this answer. – mickmackusa Dec 24 '20 at 04:56

score 15 · Answer 3 · answered Jul 27 '09 at 14:24

15

Here is something i found that is tried and tested

function make_links_blank($text)
{
  return  preg_replace(
     array(
       '/(?(?=<a[^>]*>.+<\/a>)
             (?:<a[^>]*>.+<\/a>)
             |
             ([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
         )/iex',
       '/<a([^>]*)target="?[^"\']+"?/i',
       '/<a([^>]+)>/i',
       '/(^|\s)(www.[^<> \n\r]+)/iex',
       '/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
       (\\.[A-Za-z0-9-]+)*)/iex'
       ),
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
       '<a\\1',
       '<a\\1 target="_blank">',
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
       "stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
       ),
       $text
   );
}

It works for me. And it works for emails and URL's, Sorry to answer my own question. :(

But this one is the only that works

Here is the link where i found it : http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/Q_21878567.html

Sry in advance for it being a experts-exchange.

answered Jul 27 '09 at 14:24

Angel.King.47

7,922
14
60
85

I'll just note that this solution fails most of the requirements I suggested, namely #1, 2, 3, 5 and 7, but if this meets your requirements, great. Just don't use it on untrusted input, since it performs no HTML escaping. :-) – Søren Løvborg Jul 27 '09 at 15:05
You talk about this escaping.. if you could explain what this escaping is, it may make it better for me and who knows someone else, to better understand your answer :D – Angel.King.47 Jul 27 '09 at 18:25
3

To prevent cross site scripting, you must never allow a visitor to add arbitrary HTML code to a page. A simple example is a form handler which simply does a `print($_POST["text"]);`. The simplest (and safest) way to prevent this is to run all user supplied text through `htmlspecialchars()`, which *escapes* HTML tags and entities, effectively turning them into plain text. For this question, you want to allow *some* HTML in the output (namely, link tags), which complicates matters, since we can no longer simply use `htmlspecialchars()`. – Søren Løvborg Jul 28 '09 at 01:38
2

As stackoverflow does, you could add `rel="nofollow"` to user links – Benjamin Crouzier Jan 21 '13 at 14:38
If the string you're converting is coming from user input stored somewhere like a database, you could prevent XSS by escaping before saving, so you retrieve the escaped text to use with this function – Cedric Ipkiss Jan 15 '15 at 23:09
My newest VPS server has php 7 which outputs Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead ... – Svetoslav Marinov Oct 14 '16 at 08:33
This answer is outdated and cannot longer be used. You cannot incorporate function calls into the replacement argument of a `preg_replace()` (`stripslashes()`). – mickmackusa Dec 24 '20 at 04:53

Dharmendra Jadon · Answer 4 · 2016-10-11T18:46:11.850

4

Here is the code using Regular Expressions in function

<?php
//Function definations
function MakeUrls($str)
{
$find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si');

$replace=array('<a href="$1" target="_blank">$1</a>', '<a href="http://$1" target="_blank">$1</a>');

return preg_replace($find,$replace,$str);
}
//Function testing
$str="www.cloudlibz.com";
$str=MakeUrls($str);
echo $str;
?>

edited Oct 11 '16 at 18:46

answered Mar 21 '14 at 23:51

Dharmendra Jadon

131
1
10

Does this cater for multiple url's in a string? – Amien Dec 11 '14 at 08:12
Sweet, it caters for multiple url's in a string, you're just missing a "<" at $replace=array('a href – Amien Dec 11 '14 at 08:30

score 4 · Answer 5 · answered May 02 '15 at 15:26

I've been using this function, it works for me

function AutoLinkUrls($str,$popup = FALSE){
    if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches)){
        $pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
        for ($i = 0; $i < count($matches['0']); $i++){
            $period = '';
            if (preg_match("|\.$|", $matches['6'][$i])){
                $period = '.';
                $matches['6'][$i] = substr($matches['6'][$i], 0, -1);
            }
            $str = str_replace($matches['0'][$i],
                    $matches['1'][$i].'<a href="http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'"'.$pop.'>http'.
                    $matches['4'][$i].'://'.
                    $matches['5'][$i].
                    $matches['6'][$i].'</a>'.
                    $period, $str);
        }//end for
    }//end if
    return $str;
}//end AutoLinkUrls

All credits goes to - http://snipplr.com/view/68586/

Enjoy!

This one has an issue if your string has comma separated URLs, for example "https://www.google.com, http://www.google.com". The first URL in this example would end up with a href="https://www.google.com," including the comma. A URL that ends with a comma is valid, so I guess this is up to the use case if you think it is more likely the string intended the comma as punctuation or as part of the URL. — dan-iel, Jul 29 '20 at 23:57

score 1 · Answer 6 · answered Apr 05 '12 at 05:32

I know this answer has been accepted and that this question is quite old, but it can be useful for other people looking for other implementations.

This is a modified version of the code posted by: Angel.King.47 on July 27,09:

$text = preg_replace(
 array(
   '/(^|\s|>)(www.[^<> \n\r]+)/iex',
   '/(^|\s|>)([_A-Za-z0-9-]+(\\.[A-Za-z]{2,3})?\\.[A-Za-z]{2,4}\\/[^<> \n\r]+)/iex',
   '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex'
 ),  
 array(
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\3':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\4':'\\0'))",
   "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a>&nbsp;':'\\0'))",
 ),  
 $text
);

Changes:

I removed rules #2 and #3 (I'm not sure in which situations are useful).
Removed email parsing as I really don't need it.
I added one more rule which allows the recognition of URLs in the form: [domain]/* (without www). For example: "example.com/faq/" (Multiple tld: domain.{2-3}.{2-4}/)
When parsing strings starting with "http://", it removes it from the link label.
Added "target='_blank'" to all links.
Urls can be specified just after any(?) tag. For example: <b>www.example.com</b>

As "Søren Løvborg" has stated, this function does not escape the URLs. I tried his/her class but it just didn't work as I expected (If you don't trust your users, then try his/her code first).

score 1 · Answer 7 · answered Jul 27 '09 at 13:29

This RegEx should match any link except for these new 3+ character toplevel domains...

{
  \\b
  # Match the leading part (proto://hostname, or just hostname)
  (
    # http://, or https:// leading part
    (https?)://[-\\w]+(\\.\\w[-\\w]*)+
  |
    # or, try to find a hostname with more specific sub-expression
    (?i: [a-z0-9] (?:[-a-z0-9]*[a-z0-9])? \\. )+ # sub domains
    # Now ending .com, etc. For these, require lowercase
    (?-i: com\\b
        | edu\\b
        | biz\\b
        | gov\\b
        | in(?:t|fo)\\b # .int or .info
        | mil\\b
        | net\\b
        | org\\b
        | [a-z][a-z]\\.[a-z][a-z]\\b # two-letter country code
    )
  )

  # Allow an optional port number
  ( : \\d+ )?

  # The rest of the URL is optional, and begins with /
  (
    /
    # The rest are heuristics for what seems to work well
    [^.!,?;"\\'()\[\]\{\}\s\x7F-\\xFF]*
    (
      [.!,?]+ [^.!,?;"\\'()\\[\\]\{\\}\s\\x7F-\\xFF]+
    )*
  )?
}ix

It's not written by me, I'm not quite sure where I got it from, sorry that I can give no credit...

I understand that the above are patterns but im so lost. sry — Angel.King.47, Jul 27 '09 at 13:42

score 1 · Answer 8 · answered Jul 27 '09 at 13:41

this should get you email addresses:

$string = "bah bah steve@gmail.com foo";
$match = preg_match('/[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)*\@[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+(?:\.[^\x00-\x20()<>@,;:\\".[\]\x7f-\xff]+)+/', $string, $array);
print_r($array);

// outputs:
Array
(
    [0] => steve@gmail.com
)

Svetoslav Marinov · Answer 9 · 2016-10-14T10:35:19.413

As I mentioned in one of the comments above my VPS, which is running php 7, started emitting warnings Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead. The buffer after the replacement was empty/false.

I have rewritten the code and made some improvements. If you think that you should be in the author section feel free to edit the comment above the function make_links_blank name. I am intentionally not using the closing php ?> to avoid inserting whitespace in the output.

<?php

class App_Updater_String_Util {
    public static function get_default_link_attribs( $regex_matches = [] ) {
        $t = ' target="_blank" ';
        return $t;
    }

    /**
     * App_Updater_String_Util::set_protocol();
     * @param string $link
     * @return string
     */
    public static function set_protocol( $link ) {
        if ( ! preg_match( '#^https?#si', $link ) ) {
            $link = 'http://' . $link;
        }
        return $link;
    }

/**
     * Goes through text and makes whatever text that look like a link an html link
     * which opens in a new tab/window (by adding target attribute).
     * 
     * Usage: App_Updater_String_Util::make_links_blank( $text );
     * 
     * @param str $text
     * @return str
     * @see http://stackoverflow.com/questions/1188129/replace-urls-in-text-with-html-links
     * @author Angel.King.47 | http://dashee.co.uk
     * @author Svetoslav Marinov (Slavi) | http://orbisius.com
     */
    public static function make_links_blank( $text ) {
        $patterns = [
            '#(?(?=<a[^>]*>.+?<\/a>)
                 (?:<a[^>]*>.+<\/a>)
                 |
                 ([^="\']?)((?:https?|ftp):\/\/[^<> \n\r]+)
             )#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = empty( $r2 ) ? '' : App_Updater_String_Util::set_protocol( $r2 );
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
             },

            '#(^|\s)((?:https?://|www\.|https?://www\.)[^<>\ \n\r]+)#six' => function ( $matches ) {
                $r1 = empty( $matches[1] ) ? '' : $matches[1];
                $r2 = empty( $matches[2] ) ? '' : $matches[2];
                $r3 = empty( $matches[3] ) ? '' : $matches[3];

                $r2 = ! empty( $r2 ) ? App_Updater_String_Util::set_protocol( $r2 ) : '';
                $res = ! empty( $r2 ) ? "$r1<a href=\"$r2\">$r2</a>$r3" : $matches[0];
                $res = stripslashes( $res );

                return $res;
            },

            // Remove any target attribs (if any)
            '#<a([^>]*)target="?[^"\']+"?#si' => '<a\\1',

            // Put the target attrib
            '#<a([^>]+)>#si' => '<a\\1 target="_blank">',

            // Make emails clickable Mailto links
            '/(([\w\-]+)(\\.[\w\-]+)*@([\w\-]+)
                (\\.[\w\-]+)*)/six' => function ( $matches ) {

                $r = $matches[0];
                $res = ! empty( $r ) ? "<a href=\"mailto:$r\">$r</a>" : $r;
                $res = stripslashes( $res );

                return $res;
            },
        ];

        foreach ( $patterns as $regex => $callback_or_replace ) {
            if ( is_callable( $callback_or_replace ) ) {
                $text = preg_replace_callback( $regex, $callback_or_replace, $text );
            } else {
                $text = preg_replace( $regex, $callback_or_replace, $text );
            }
        }

        return $text;
    }
}

score 0 · Answer 10 · answered Jul 27 '09 at 13:30

0

Something along the lines of :

<?php
if(preg_match('@^http://(.*)\s|$@g', $textarea_url, $matches)) {
    echo '<a href=http://", $matches[1], '">', $matches[1], '</a>';
}
?>

answered Jul 27 '09 at 13:30

OneOfOne

95,033
20
184
185

amarjit singh · Answer 11 · 2013-07-21T12:06:00.537

This class changes the urls into text and while keeping the home url as it is. I hope this will help and save time for you.Enjoy.

class RegClass 
{ 

     function preg_callback_url($matches) 
     { 
        //var_dump($matches); 
        //Get the matched URL  text <a>text</a>
        $text = $matches[2];
        //Get the matched URL link <a href ="http://www.test.com">text</a>
        $url = $matches[1];

        if($url=='href ="http://www.test.com"'){
         //replace all a tag as it is
         return '<a href='.$url.' rel="nofollow"> '.$text.' </a>'; 

         }else{
         //replace all a tag to text
         return " $text " ;
         }
} 
function ParseText($text){ 

    $text = preg_replace( "/www\./", "http://www.", $text );
        $regex ="/http:\/\/http:\/\/www\./"
    $text = preg_replace( $regex, "http://www.", $text );
        $regex2 = "/https:\/\/http:\/\/www\./";
    $text = preg_replace( $regex2, "https://www.", $text );

        return preg_replace_callback('/<a\s(.+?)>(.+?)<\/a>/is',
                array( &$this,        'preg_callback_url'), $text); 
      } 

} 
$regexp = new RegClass();
echo $regexp->ParseText($text);

This class has used preg_replace _callback function to search and repace URL with text .If you have any error in ParseText Function then just replace the $regex and regex2 with actual patterns. — amarjit singh, May 12 '13 at 15:49

score 0 · Answer 12 · answered Mar 05 '14 at 04:53

If you want to trust the IANA you can get your current list of offcially supported TLDs in use there like:

  $validTLDs = 
explode("\n", file_get_contents('http://data.iana.org/TLD/tlds-alpha-by-domain.txt')); //get the official list of valid tlds
  array_shift($validTLDs); //throw away first line containing meta data
  array_pop($validTLDs); //throw away last element which is empty

Makes Søren Løvborg's solution #2 a bit less verbose and spares you the hassle of updating the list, nowadays new tlds are thrown out so carelessly ;)

score 0 · Answer 13 · edited Sep 02 '17 at 18:18

0

This worked for me (turned one of the answers into a PHP function)

function make_urls_from_text ($text){
   return preg_replace('/(http[s]{0,1}\:\/\/\S{4,})\s{0,}/ims', '<a href="$1" target="_blank">$1 </a>', $text);
}

edited Sep 02 '17 at 18:18

OmniPotens

1,125
13
30

answered Jul 21 '15 at 19:02

Shawn Gervais

1

user13611442 · Answer 14 · 2020-05-27T21:33:50.460

0

This class I created works for my needs, admittedly it does needs some work though;

class addLink
{
    public function link($string)
    {
        $expression = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,63}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
        if(preg_match_all($expression, $string, $matches) == 1)// If the pattern is found then
        {
            $string = preg_replace($expression, '<a href="'.$matches[0][0].'" target="_blank">$1</a>', $string);
        }

        return $string;       
    }
}

An example of using this code;

include 'PHP/addLink.php';

if(class_exists('addLink')) 
{                  
    $al = new addLink();                  
}
else{
    echo 'Class not found...';
} 

$paragraph = $al->link($paragraph);

edited May 27 '20 at 21:33

answered May 26 '20 at 10:36

user13611442

1
2

`[a-z]{2,4}` is really short for TLDs, have a look at: [TLD list](https://www.iana.org/domains/root/db) – Toto May 26 '20 at 10:42
moreover, your regex matches `http://qdj$$$-=`, [demo](https://regex101.com/r/FuzQ5G/1), not sure it's a valid URL ;) – Toto May 26 '20 at 10:46
I changed the TLD length to 63 as per [RFC 1034](https://tools.ietf.org/html/rfc1034) and updated above... – user13611442 May 26 '20 at 10:52
I'm currently reading [RFC 1035](https://tools.ietf.org/html/rfc1035) to fix my regex pattern matching... – user13611442 May 26 '20 at 11:10

score 0 · Answer 15 · answered Jul 30 '20 at 00:17

0

This is just a variation of the solution posted by Dharmendra Jadon, so if you like it up vote his instead!

I just added a parameter to make opening the link in a new window (target="_blank") optional, as I saw this in some of the other solutions and liked the flexibility:

function MakeUrls($str, $popup = FALSE)
{
    $find=array('`((?:https?|ftp)://\S+[[:alnum:]]/?)`si','`((?<!//)(www\.\S+[[:alnum:]]/?))`si');

    $replace=array('<a href="$1"' . ($popup ? ' target="_blank"' : '') . '>$1</a>', '<a href="http://$1"' . ($popup ? ' target="_blank"' : '') . '>$1</a>');

    return preg_replace($find,$replace,$str);
}

answered Jul 30 '20 at 00:17

dan-iel

801
8
4

The `s` pattern modifier is useless if there are no "any character" dots in the pattern. – mickmackusa Dec 24 '20 at 04:58
This will fail if your link is within quotes (e.g. ```xxxxxxxx "http://www.bbc.com/list"
Received yyyyy```) see https://regex101.com/r/puRu94/1 – user1432181 Feb 28 '22 at 18:53

score -1 · Answer 16 · answered Aug 14 '13 at 10:12

-1

This should get your twitter handle without touching on your email /(?<=^|(?<=[^a-zA-Z0-9-.]))@([A-Za-z]+[A-Za-z0-9]+)/i

answered Aug 14 '13 at 10:12

Mfundo Mtselu

1

1

Did you copy my answer from http://stackoverflow.com/questions/2304632/regex-for-twitter-username/6351873#6351873 and pasted on to my question, which is not even relevent. A little credit would have at least got you no downvote! – Angel.King.47 Aug 14 '13 at 11:09

André Eriksson · Answer 17 · 2009-07-27T14:11:18.107

-2

While matching the full url spec is difficult, here's a regular expression that generally does a good job:

([\w-]+(\.[\w-]+)*@([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?)

To use this in preg_replace, however, you need to escape it. As so:

$pattern = "/([\\w-]+(\\.[\\w-]+)*@([a-z0-9-]+(\\.[a-z0-9-]+)*?\\.[a-z]{2,6}|(\\d{1,3}\\.){3}\\d{1,3})(:\\d{4})?)/";
$replaced_texttext = preg_replace($pattern, '<a href="$0" title="$0">$0</a>', $text);

edited Jul 27 '09 at 14:11

answered Jul 27 '09 at 13:30

André Eriksson

4,296
2
19
16

thats why i hate preg replace... Ima test it out and let you know :D – Angel.King.47 Jul 27 '09 at 13:32
http://news.bbc.co.uk/1/hi/england/surrey/8168892.stm does that work for you... Sry you will have to click the link to get full. Stackoverflow is cutting it short – Angel.King.47 Jul 27 '09 at 13:41
Funny enough it worked for emails and not url...lol, But failed for emails such as mail@stack.co.uk – Angel.King.47 Jul 27 '09 at 13:58
The regular expression had some missing backslashes, which is why it didn't match those URL's properly. Should be fixed now. – André Eriksson Jul 27 '09 at 14:12
sry to say this.. but nope.. its doing it to only emails and it still have the same problems for domains such as .co.uk, the .uk part gets left out. But its not working for url's at all – Angel.King.47 Jul 27 '09 at 14:17

Replace URLs in text with HTML links

17 Answers17

Linked

Related