34

What's the fastest method to detect if foo='http://john.doe' is an external (in comparsion to window.location.href)?

mate64
  • 9,876
  • 17
  • 64
  • 96

9 Answers9

39

If you consider a URL being external if either the scheme, host or port is different, you could do something like this:

function isExternal(url) {
    var match = url.match(/^([^:\/?#]+:)?(?:\/\/([^\/?#]*))?([^?#]+)?(\?[^#]*)?(#.*)?/);
    if (typeof match[1] === "string" && match[1].length > 0 && match[1].toLowerCase() !== location.protocol) return true;
    if (typeof match[2] === "string" && match[2].length > 0 && match[2].replace(new RegExp(":("+{"http:":80,"https:":443}[location.protocol]+")?$"), "") !== location.host) return true;
    return false;
}
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 2
    @roXon: The regular expression is actually [from the current RFC for URIs](http://tools.ietf.org/html/rfc3986#appendix-B). – Gumbo Jun 05 '11 at 07:18
  • thank you gumbo! this works like a rocket in moz, webkit(safari&chrome) but not in internet explorer (all false) - why so? – mate64 Jun 05 '11 at 07:52
  • @msec: Maybe IE has other values in `location.protocol` or `location.host` than I expected. What are their values in your case? – Gumbo Jun 05 '11 at 10:04
  • 1
    for external links (http://www.facebook.com/mypage/id/123456789) i recieve the following results in msie: test 1 (typeof match[1]): false, test 2 (typeof match[2]): true, for internal links (sub.mydomain.com = host): test 1 (typeof match[1]): true, test 2 (typeof match[2]): true (should all be false) - why so ? – mate64 Jun 05 '11 at 10:20
  • @msec: It seems that IE doesn’t return `undefined` if a subpattern was skipped but it returns an empty string. – Gumbo Jun 05 '11 at 11:27
  • 1
    5mins faster ^^, i've found out the following (with !!): if (!!match[1] && match[1].toLowerCase() ... anyways, thank you gumbo! – mate64 Jun 05 '11 at 11:44
  • One note for anybody that wants to use this: Just tested it in a jsfiddle: It returns the wrong result for this url: ``foojsfiddle.net/bar.html`` – Philip Daubmeier Jan 12 '12 at 13:59
  • 1
    @PhilipDaubmeier: That’s not an [absolute URL](http://stackoverflow.com/a/904066/53114). – Gumbo Jan 12 '12 at 14:30
  • Ooops damnit, youre right of course... Shame on me. Thanks for that snippet! – Philip Daubmeier Jan 12 '12 at 15:38
  • This seems fancy, and may well be the most correct answer. But because I cannot easily understand it, I can't use it. Please add comments. Also... why are port 80 and 443 hard coded? Will this fail if you're running a web server on another port? – speedplane Dec 29 '15 at 12:08
39

Update: I did some more research and found that using new URL is easily fast enough, and IMO the most straight-forward way of doing this.

It is important to note that every method I've tried takes less than 1ms to run even on an old phone. So performance shouldn't be your primary consideration unless you are doing some large batch processing. Use the regex version if performance is your top priority.

These are the three methods I tried:

new URL:

const isExternalURL = (url) => new URL(url).origin !== location.origin;

String.replace:

function isExternalReplace(url) {
  const domain = (url) => url.replace('http://','').replace('https://','').split('/')[0];       
  return domain(location.href) !== domain(url);
}

Regex:

const isExternalRegex = (function(){
  const domainRe = /https?:\/\/((?:[\w\d-]+\.)+[\w\d]{2,})/i;

  return (url) => {
    const domain = (url) => domainRe.exec(url)[1];
    return domain(location.href) !== domain(url);
  }
})();

Here are some basic tests I used to test performance: https://is-external-url-test.glitch.me/

pseudosavant
  • 7,056
  • 2
  • 36
  • 41
  • Cool. Glad it works better for you. Regex's definitely have their place, but often times it is used like a chainsaw when a carving knife might be more appropriate. – pseudosavant Mar 19 '12 at 17:25
  • 2
    What about `magnet:` or `mailto:` ? – venimus Oct 22 '13 at 14:02
  • @venimus I thought other protocols should return that it is an 'external' link but when I tried it it didn't work. Turns out I had a bug/typo in my example code. I had `location.href` in my `domain` function instead of `url`. With that changed it now works properly for other protocols as well. – pseudosavant Oct 28 '13 at 17:11
  • 2
    I would MUCH rather maintain this code than the regex in the accepted answer. Nicely done! – Matthew Johnson Jun 04 '14 at 18:29
  • This doesn't take into account ports not matching. So a URL from https://localhost === http://localhost in this script, which may or may not work with CORS depending on how the server is setup. – Michael Deal Sep 23 '14 at 01:08
  • @MichaelDeal Did you mean to say something like `localhost` === `localhost:123`? A different port on the same host is considered to be from a different origin by definition. CORS stands for *Cross-Origin*-Resource-Sharing. That's why you'd need CORS to access a cross-origin resource on another port. – pseudosavant Sep 23 '14 at 05:35
  • 5
    I'm sorry but this is a terrible answer. 1. your first method doesn't work on stuff that start with a slash. for example, isExternal ("/questions/123456") will return true. 2. your second (regex) method will throw an exception on anything that doesn't start with http or https. – Ronen Ness Sep 01 '15 at 13:10
  • Correct, it doesn't work on relative URLs. The example the op gives is an absolute URL. – pseudosavant Sep 01 '15 at 16:05
  • Regexp doesn't work for domains containing hypen symbol (downvoting). In first method it is possible to specify limit for the `split('/', 1)`. – happy_marmoset Sep 30 '16 at 09:39
  • Changed it to support `-` in the hostname. Thanks for the downvote over a minor edge case in the **second** solution I provided. – pseudosavant Oct 03 '16 at 19:48
  • maybe you also want to check `url.protocol !== location.protocol` (and the same for `.port`) – bb1950328 Jan 28 '21 at 18:47
  • 1
    I changed it to use `.origin` instead of `.host` as the origin is the combination of protocol + host + port. All 3 will have to match for it to return true. – pseudosavant Mar 02 '21 at 18:51
18

I've been using psuedosavant's method, but ran into a few cases where it triggered false positives, such as domain-less links ( /about, image.jpg ) and anchor links ( #about ). The old method would also give inaccurate results for different protocols ( http vs https ).

Here's my slightly modified version:

var checkDomain = function(url) {
  if ( url.indexOf('//') === 0 ) { url = location.protocol + url; }
  return url.toLowerCase().replace(/([a-z])?:\/\//,'$1').split('/')[0];
};

var isExternal = function(url) {
  return ( ( url.indexOf(':') > -1 || url.indexOf('//') > -1 ) && checkDomain(location.href) !== checkDomain(url) );
};

Here are some tests with the updated function:

isExternal('http://google.com'); // true
isExternal('https://google.com'); // true
isExternal('//google.com'); // true (no protocol)
isExternal('mailto:mail@example.com'); // true
isExternal('http://samedomain.com:8080/port'); // true (same domain, different port)
isExternal('https://samedomain.com/secure'); // true (same domain, https)

isExternal('http://samedomain.com/about'); // false (same domain, different page)
isExternal('HTTP://SAMEDOMAIN.COM/about'); // false (same domain, but different casing)
isExternal('//samedomain.com/about'); // false (same domain, no protocol)
isExternal('/about'); // false
isExternal('image.jpg'); // false
isExternal('#anchor'); // false

It's more accurate overall, and it even ends up being marginally faster, according to some basic jsperf tests. If you leave off the .toLowerCase() for case-insensitive testing, you can speed it up even more.

shshaw
  • 3,123
  • 2
  • 23
  • 33
  • These don't work: isExternal('/source//1/'); isExternal('/source:1/'); isExternal('#anchor:'); isExternal('#anchor//1'); – Vincente Jan 17 '18 at 16:39
  • @Vincente Those aren't traditionally valid links or anchors, but by removing the first two checks in `isExternal`: `( url.indexOf(':') > -1 || url.indexOf('//') > -1 )`, you should be able to get the results you want. – shshaw Jan 17 '18 at 20:17
2

pseudosavant's answer didn't exactly work for me, so I improved it.

var isExternal = function(url) {
    return !(location.href.replace("http://", "").replace("https://", "").split("/")[0] === url.replace("http://", "").replace("https://", "").split("/")[0]);   
}
Jon
  • 8,205
  • 25
  • 87
  • 146
  • +1 For adding another way to do this. I'm curious as to where it is that my solution didn't work for you? – pseudosavant Jul 15 '13 at 18:38
  • 1
    I think it is because the URL I was passing in sometimes also included the `http(s)`. – Jon Jul 16 '13 at 03:52
  • Ah, that makes sense. In my example I figured the OP would be in control of the variable being passed in and that it would already be stripped of the protocol. I updated mine now to do the same domain extraction on the `url` and `location`. – pseudosavant Jul 17 '13 at 16:47
  • 1
    Why use `!(... === ...)` instead of `!==` ? – venimus Oct 22 '13 at 13:59
  • This should have been an edit/comment to @pseudosavant answer, you just added http(s) check which is already updated on the previous answer. – T04435 Feb 03 '20 at 02:06
1

I had to build on pseudosavant's and Jon's answers because, I needed to also catch cases of URLs beginning with "//" and URLs that do not include a sub-domain. Here's what worked for me:

var getDomainName = function(domain) {
    var parts = domain.split('.').reverse();
    var cnt = parts.length;
    if (cnt >= 3) {
        // see if the second level domain is a common SLD.
        if (parts[1].match(/^(com|edu|gov|net|mil|org|nom|co|name|info|biz)$/i)) {
            return parts[2] + '.' + parts[1] + '.' + parts[0];
        }
    }
    return parts[1]+'.'+parts[0];
};
var isExternalUrl = function(url) {
 var curLocationUrl = getDomainName(location.href.replace("http://", "").replace("https://", "").replace("//", "").split("/")[0].toLowerCase());
 var destinationUrl = getDomainName(url.replace("http://", "").replace("https://", "").replace("//", "").split("/")[0].toLowerCase());
 return !(curLocationUrl === destinationUrl)
};

$(document).delegate('a', 'click', function() {
 var aHrefTarget = $(this).attr('target');
 if(typeof aHrefTarget === 'undefined')
  return;
 if(aHrefTarget !== '_blank')
  return;  // not an external link
 var aHrefUrl = $(this).attr('href');
 if(aHrefUrl.substr(0,2) !== '//' && (aHrefUrl.substr(0,1) == '/' || aHrefUrl.substr(0,1) == '#'))
  return;  // this is a relative link or anchor link
 if(isExternalUrl(aHrefUrl))
  alert('clicked external link');
});
<h3>Internal URLs:</h3>
<ul>
  <li><a href="stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
  <li><a href="www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
  <li><a href="//stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">//stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
  <li><a href="//www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls" target="_blank">//www.stackoverflow.com/questions/6238351/fastest-way-to-detect-external-urls</a></li>
</ul>
<h3>External URLs:</h3>
<ul>
  <li><a href="http://www.yahoo.com" target="_blank">http://www.yahoo.com</a></li>
  <li><a href="yahoo.com" target="_blank">yahoo.com</a></li>
  <li><a href="www.yahoo.com" target="_blank">www.yahoo.com</a></li>
  <li><a href="//www.yahoo.com" target="_blank">//www.yahoo.com</a></li>
</ul>
BumbleB2na
  • 10,723
  • 6
  • 28
  • 30
1

For my purpose I just did a little modification to shshaw's answer to verify if links are not empty or just a single character (supposing it's '#'), which original answer method returns false positive. This was for my purpose to indicate to users they will leave my page by adding some FA icon.

// same thing here, no edit
function checkDomain(url) {
    if ( url.indexOf('//') === 0 ) { url = location.protocol + url; }
    return url.toLowerCase().replace(/([a-z])?:\/\//,'$1').split('/')[0];
};

function isExternal(url) {
    // verify if link is empty or just 1 char + original answer
    return (url.length > 1 && url.indexOf(':') > -1 || url.indexOf('//') > -1 ) && checkDomain(location.href) !== checkDomain(url);
};

// add some icon to external links (function is called in an init method)
function addExternalLinkIcon(){
    $("a[href]").each(function(i,ob){
        // we check it
        if(isExternal($(ob).attr("href"))){
            // then add some beauty if it's external
            // (we assume Font Awesome CSS and font is loaded for my example, of course :-P)
            $(ob).append(" <i class='fa fa-external-link'></i> ");
        }
    });
}
Joel Harkes
  • 10,975
  • 3
  • 46
  • 65
0

Shouldn't

function is_external( url ) {
    return url.match( /[a-zA-Z0-9]*:\/\/[^\s]*/g ) != null;
}

do the trick? Doesn't work for absolute (internal) urls.

user3116736
  • 111
  • 1
  • 5
0

The main problem, is how to parse an URL, and get a host name our of it. It can be done with following way:

var _getHostname = function(url) {
  var parser = document.createElement('a');
  parser.href = url;

  return parser.hostname;
}

var isExternal = (_getHostname(window.location.href) !== _getHostname('http://john.doe'));

Or you can use is-url-external module.

var isExternal = require('is-url-external');
isExternal('http://john.doe'); // true | false 
mrded
  • 4,674
  • 2
  • 34
  • 36
-3

You can simply use use npm package is-internal-link

Installation

npm install --save is-internal-link

Usage

import { isInternalLink } from "is-internal-link"
isInternalLink('https://www.google.com') // false
isInternalLink('/page1') // true

I also usually this with react like this

import React from 'react'

import { Link as ReactRouterLink} from 'react-router-dom'
import { isInternalLink } from 'is-internal-link'

const Link = ({ children, to, activeClassName, ...other }) => {
  if (isInternalLink(to)) {
    return (
      <ReactRouterLink to={to} activeClassName={activeClassName} {...other}>
        {children}
      </ReactRouterLink>
    )
  }
  return (
    <a href={to} target="_blank" {...other}>
      {children}
    </a>
  )
}

export default Link

Disclaimer: I am the author of this lib

muhajirframe
  • 61
  • 1
  • 7