3

I reviewed the answers to the basic question in this link: How to replace plain URLs with links? and have decided to use the code from what Christian Koch suggested, but it only partially covers what I need to do. I am hoping someone here can help me out.

The code Christian Koch supplied works great with one exception - When I have text that already contains links and just plain text, those links are getting a double 'a' tag, therefore causing html problems that make the text not appear right in the browser.

For instance the code works fine for this:

     www.yahoo.com is a website just like http://www.google.com

I see the yahoo and google text now appear as links and both have a link wrapper around them just as I would expect:

     <a href="http://www.yahoo.com">www.yahoo.com</a> is a website just like <a href="http://www.google.com">http://www.google.com</a>

Now take this text (contains basic text and a link already defined):

     www.yahoo.com is a website just like <a href="http://www.google.com">http://www.google.com</a>

When using the code supplied, the yahoo link is correct, but the google link now has a double tag:

     <a href="http://www.yahoo.com">www.yahoo.com</a> is a website just like <a href="<a href="http://www.google.com">http://www.google.com</a>" target="_blank"><a href="http://www.google.com">http://www.google.com</a></a>

Can someone please help me get the pattern correct so that when the text already contains a link, the pattern ignores it, but still replaces the other text without a tag. I only want the pattern to do the replacing if and only if the text is not already contained in a link tag.

Here is the code i am using from the other post:

   doLinks: function(originalText) 
   {
    // http://, https://, ftp://
    var urlPattern = /\b(?:https?|ftp):\/\/[a-z0-9-+&@#\/%?=~_|!:,.;]*[a-z0-9-+&@#\/%=~_|]/gim;

    // www. sans http:// or https://
    var pseudoUrlPattern = /(^|[^\/])(www\.[\S]+(\b|$))/gim;

    // Email addresses *** here I've changed the expression ***
    var emailAddressPattern = /(([a-zA-Z0-9_\-\.]+)@[a-zA-Z_]+?(?:\.[a-zA-Z]{2,6}))+/gim;

    return originalText
        .replace(urlPattern, '<a target="_blank" href="$&">$&</a>')
        .replace(pseudoUrlPattern, '$1<a target="_blank" href="http://$2">$2</a>')
        .replace(emailAddressPattern, '<a target="_blank" href="mailto:$1">$1</a>');
}
Community
  • 1
  • 1

2 Answers2

2

Well after giving the problem another look, I've come to believe that a combination of several sub-patterns going to do a better job then one mega-pattern. So I divided the pseudoUrlPattern into two, one for urls at the beginning of the line, and one for every other url in the given text. Consider the following revised code, complete with my testing text:

  var doLinks = function(originalText) {

        var urlPattern = /[^<>]\b(?:https?|ftp):\/\/[a-z0-9-+&@#\/%?=~_|!:,.;]*[a-z0-9-+&@#\/%=~_|](?![^<>])/gim;

        var pseudoUrlPattern1 = /^([^\/])?(www\.[\S]+(\b|$|[^<>]))/gim
        var pseudoUrlPattern2 = /([^\/"><])(www\.[\S]+(\b|$))(?![^<>])?/gim;

        var emailAddressPattern = /(([a-zA-Z0-9_\-\.]+)@[a-zA-Z_]+?(?:\.[a-zA-Z]{2,6}))+/gim;

        return originalText
            .replace(urlPattern, '<a target="_blank" href="$&">$&</a>')
            .replace(pseudoUrlPattern1, '$1<a target="_blank" href="http://$2">$2</a>')
            .replace(pseudoUrlPattern2, '$1<a target="_blank" href="http://$2">$2</a>')
            .replace(emailAddressPattern, '<a target="_blank" href="mailto:$1">$1</a>');
    }    

    var string = 'www.yahoo.com is a website just like <a href="http://www.google.com">http://www.google.com</a> and not like <a href="www.facebook.com "> www.facebook.com </a> and not like www.example.com';

give it a try, tell me how it went.

one thing to pay attention to: the urls already in the anchor tags preferably should not have any spaces between the url and the tag.

Vlad Lyga
  • 1,133
  • 9
  • 10
  • The changes only partially work. Now the 2nd part of the sentence appears correctly, but the 1st part has been ignored and remains plain text. For instance the yahoo portion is still plain text, while the google portion has been untouched and remains a link (without the double tag). I need the yahoo portion to be a link AND the google portion untouched and without a double tag. – user123456789 Jan 07 '13 at 16:40
  • Ok, so I think I've got it right this time. Check out my edited answer with complete code example. – Vlad Lyga Jan 07 '13 at 20:07
1

Just forbid the replacement, when the URL is in quotes ' or doublequotes ".

// http://, https://, ftp://
var urlPattern = /[^"']\b(?:https?|ftp):\/\/[a-z0-9-+&@#\/%?=~_|!:,.;]*[a-z0-9-+&@#\/%=~_|]/gim;

// www. sans http:// or https://
var pseudoUrlPattern = /(^|[^\/"'])(www\.[\S]+(\b|$))/gim;

// Email addresses *** here I've changed the expression ***
var emailAddressPattern = /[^"'](([a-zA-Z0-9_\-\.]+)@[a-zA-Z_]+?(?:\.[a-zA-Z]{2,6}))+/gim;

Maybe you need escaping the quotes or doublequotes. I did not test it.

By the way: Your regex does not match all domain names. There are more and more internationalized domain names. See the examples here in the German Wikipedia

erik
  • 2,278
  • 1
  • 23
  • 30
  • Well, this is not working perfectly. You should try the solution from Vlad Lyga and vote him up. – erik Jan 23 '13 at 12:46