Hyperlinks are breaking after Regex

Question

After emmiting new message to client and receiving it in the client side I'm passing it trough function which determines hyperlinks and makes it clickable. The problem is that if string contains link like this this is the link http://linktosomwhere.com?ref=myname to somewhere it parses the link wrong and makes it like this: this is the link <a...>http://linktosomwhere.com</a> ?ref=myname to somewhere. So it breaks the link adding the space where it shouldn't be. As I mentioned before, everything is going on on client side with this function:

function linkify(inputText) {
    var replacedText, replacePattern1, replacePattern2, replacePattern3;

    //URLs starting with http://, https://, or ftp://
    replacePattern1 = /(\b(https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/gim;
    replacedText = inputText.replace(replacePattern1, '<a href="$1" rel="nofollow" target="_blank">$1</a>');

    //URLs starting with "www." (without // before it, or it'd re-link the ones done above).
    replacePattern2 = /(^|[^\/])(www\.[\S]+(\b|$))/gim;
    replacedText = replacedText.replace(replacePattern2, '$1<a rel="nofollow" href="http://$2" target="_blank">$2</a>');

    //Change email addresses to mailto:: links.
    replacePattern3 = /(([a-zA-Z0-9\-\_\.])+@[a-zA-Z\_]+?(\.[a-zA-Z]{2,6})+)/gim;
    replacedText = replacedText.replace(replacePattern3, '<a href="mailto:$1">$1</a>');

    return replacedText;

    }
console.log(linkify('this is the link http://linktosomwhere.com?ref=myname to somewhere'));

See [*How to replace plain URLs with links?*](http://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links) — Wiktor Stribiżew, Sep 12 '16 at 09:56
Hope it's not some old bug in an outdated javascript engine on that client. Perhaps backslashing the question in the character class solves it : `/(\b(?:https?|ftp):\/\/[a-z0-9+&@#%=~_|!,.:;\?\/\-]+[a-z0-9+&@#%=~_|\/\-])/gi` — LukStorms, Sep 12 '16 at 11:32
somehow this part in frontend `[a-z0-9+&@#%=~_|!,.:;\?\/\-]` is being converted to `[a-z0-9+â€Œâ€‹&@#%=~_|\/\-]` and this might be the bug — Sandra, Sep 12 '16 at 11:41
O_o Odd. Some problem with encoding. Could go the evil "I don't care if the url uses standard characters" way and use `/\b((?:https?|ftp):\/\/\S+)/gi` ? — LukStorms, Sep 12 '16 at 11:45
Is that javascript file of yours encoded as UTF, or something else? — LukStorms, Sep 12 '16 at 12:09
so the problem was not of that regex symbols, its because of something other :/ — Sandra, Sep 12 '16 at 12:24
Yeah, I noticed you opened another question for that. Sometimes copy & paste can screw things up in unexpected ways. Manual re-typing it can help then. — LukStorms, Sep 12 '16 at 12:29
yes but the problem is not because of that regex part, something is deeper there — Sandra, Sep 12 '16 at 12:37

score 0 · Answer 1 · 2020-08-03T16:41:53.923

I recreated the function

linkify = string => {
        // Set addHTTPS to blank string
        let addHTTPS = '';
        /* 
        Regex checks if URL starts with "www." 
        If so addHTTPS is set to https://.
        */
        if (/(?=.+)\s(https:\/\/|mailto:|www\.|ftp:\/\/|http:\/\/)(?=\S+\?ref=.+$)/.exec(string)[0] == " www.") {
            addHTTPS = "https://";
        };
        // returns the replaced version of the string and automatically adds https:// if necessary because of the if statement above
        return (string.replace(/^(.+)\s(https:\/\/|mailto:|www\.|ftp:\/\/|http:\/\/)(\S+)\?ref=(.+)$/,
            '<a rel="nofollow" href="${addHTTPS}$2$3" ref="$4"> $1 </a>'));
    
    };
    
    document.write(linkify("Go to Google www.google.com?ref=google"))

Heres how the regex works:

/ => Start of regex

^ => Assert start of match

(.+) => capture group for any string of characters

\s => whitespace character

(https:\/\/|mailto:|www\.|ftp:\/\/|http:\/\/) => capture group for start of url that looks for "https://" or "http://" or "mailto:" or "www."

(\S+) => capture group for any string of characters that is not a space

?ref= => look for ref attribute

(.+) => any string of characters for the ref attribute

$ => Assert end of match

/ => End of regex

Welcome to Stack Overflow. Code-only answers are discouraged on Stack Overflow because they don't explain how it solves the problem. Please edit your answer to explain what the code does and how it answers the question, so that it is also useful for other users with similar a problem, as well as the OP. — FluffyKitten, Aug 02 '20 at 05:35
Thanks for the advice, I added another section in the answer explaining exactly how the regex works and added some comments in the code — , Aug 03 '20 at 16:45

Hyperlinks are breaking after Regex

1 Answers1

I recreated the function

Heres how the regex works: