0

I am using this regex to detect URLs in a text string:

/(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}(\.[a-z]{2,6}|:[0-9]{3,4})\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)/gi

Combined with this function to replace the detected strings with links:

function linkify(text) {
    return text.replace(urlRegex, function(url) {
        return "<a href=" + url + ">" + url + "</a>";
    });
}

Both found here: Detect URLs in text with JavaScript

This is working for the majority of links, however links such as www.fatsoma.com/flatline-cardiff and tickets.partyforthepeople.org/events/3633 are detected, but link to nowhere, adding the local path in front of the detected link e.g http://127.0.0.1:8000/filelocation/tickets.partyforthepeople.org/events/3633

Is this to do with the absence of a protocol such as Https at the beginning of the link?

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
Joshua Dunn
  • 71
  • 1
  • 6
  • Browsers have done endless damage by deciding to hide the protocol prefix in the location bar.. A valid full absolute URL has to start with `http://`, `https://` or similar. – Álvaro González Dec 30 '17 at 12:18
  • The regex really looks funky, I’m pretty sure it will deliver both false positives and false negatives. Also, based on context, there may be some decoding to do to get an actual URL. And handling of relative URL composition. And your replacement is missing quotes as well as escaping. – jcaron Dec 30 '17 at 12:51
  • I noticed that your function doesn't quote the href value. Better is: `return ${url}` (put the text within backticks, I can't use them in a comment) – Rob Monhemius Dec 30 '17 at 13:01
  • @jcaron Yeah, it's giving me some false positives on words separated by ellipses. – Joshua Dunn Dec 30 '17 at 15:28

1 Answers1

0

Is this to do with the absence of a protocol such as Https at the beginning of the link?

Yes.

Try to add a protocol if not present:

function linkify(text) {
    return text.replace(urlRegex, function(url) {
        if (url.indexOf('http://') === 0 || url.indexOf('https://') === 0) {
             return "<a href='" + url + "'>" + url + "</a>";
        }
        return "<a href='//"+url+"'>"+url+"</a>";
    });
}
mehulmpt
  • 15,861
  • 12
  • 48
  • 88
  • This fixed the issue with those links, thanks! The regex is throwing false positives though, so I'll have to work on that. – Joshua Dunn Dec 30 '17 at 15:26