0

Below is a scenario I would like to prevent, it's proper behavior by standards but to users it's scary and unsafe. As a programmer I see the unicode and run away but users may not who don't think about this.

Unaware of the best way to ASCII convert it and check for xn-- for fast rejection; I am inquiring if there's any lingo that could get me on my way and possibly help others.

I've tried so far, the punycode function available online no understanding of it just yet but the link being here:

Converting punycode with dash character to Unicode

I was unsuccessful with getting this work immediately without xn-- or - appearing in the normal links passed through linkify.

Goal is to check after linkify process for punycode and reject it. Any other link is AOK avoiding as many false positives as possible.

The link below is a demonstration, don't actually click it. Highlight it and see what I mean.

let linkify = (text) => {
  return text.replace(/(?:(?:(?:https?|ftps?):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\u00a1-\uffff][a-z0-9\u00a1-\uffff_-]{0,62})?[a-z0-9\u00a1-\uffff]\.)+(?:[a-z\u00a1-\uffff]{2,}\.?))(?::\d{2,5})?(?:[/?#]\S*)?/igm, (url) => {

    return '<a target="_blank" href="' + url + '">' + url + '</a>'
  });
};
document.querySelector(".linkarea").innerHTML = linkify("HACKER LINK: http://www.yȯutube.com/") + " url is unicode but the actual link is http://www.xn--yutube-iqc.com/ which tricks people";
<div class="linkarea">

  <div>
BGPHiJACK
  • 1,277
  • 1
  • 8
  • 16
  • Btw, http://www.xn--yutube-iqc.com/ isn't actually unsafe, it just redirects to https://www.youtube.com :-) – Bergi Dec 28 '21 at 21:09
  • Yes, maybe poor-example. It was possibly purchased by YouTube when it was discovered for phishing but a good example of the exchange that happens in unicode to ascii. – BGPHiJACK Dec 28 '21 at 21:10
  • It's not quite clear what you are trying to prevent. Your `linkify` function creates a link to an unicode domain, yes, but what do you want to happen instead? What is the expected result? – Bergi Dec 28 '21 at 21:18
  • Ohh, I would like to detect if the URL converted contains that xn-- value and just change the output to something friendly like "I enjoy bad links". – BGPHiJACK Dec 28 '21 at 21:20
  • And you want to do that within `linkify`, using a regex approach? – Bergi Dec 28 '21 at 21:22
  • Yes most efficient to do the check when a link is detected, so before the return of ''. Intercept and cut it right off and alert the user it's bad news! – BGPHiJACK Dec 28 '21 at 21:23

1 Answers1

1

The easiest solution would be to construct a URL object and check whether the hostname contains xn--:

let linkify = (text) => {
  return text.replace(/(?:(?:(?:https?|ftps?):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\u00a1-\uffff][a-z0-9\u00a1-\uffff_-]{0,62})?[a-z0-9\u00a1-\uffff]\.)+(?:[a-z\u00a1-\uffff]{2,}\.?))(?::\d{2,5})?(?:[/?#]\S*)?/igm, (url) => {
    if (new URL(url).hostname.includes('xn--'))
      return 'I enjox bad links';
    else
      return '<a target="_blank" href="' + url + '">' + url + '</a>'
  });
};
const linkarea = document.querySelector(".linkarea");
linkarea.innerHTML = linkify(linkarea.innerHTML);
<p class="linkarea">
  HACKER LINK: http://www.yȯutube.com/ url is unicode but the actual link is www.xn--yutube-iqc.com/ which tricks people
</p>

Alternatively, you can also construct an a element and inspect its .href property; it will also contain the (resolved and) punycoded url containing xn--.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • Hold the phone, it's that easy URL!? Let me double check documents, I've never really needed to call that but sweet! I'll confirm. – BGPHiJACK Dec 28 '21 at 22:00
  • Seems to check out, shows IE and Safari may have troubles, thoughts? – BGPHiJACK Dec 28 '21 at 22:02
  • If you need to support outdated browsers, use `const link = document.createElement('a'); link.href = url; if (link.href.includes('xn--')) …` however that will also find `xn--` in the path or query parameters, not just the hostname. – Bergi Dec 28 '21 at 22:04
  • Awesome, I've made this the answer as it truly is. I hadn't familiarized myself with URL sadly and was not aware. The keywords above for my title used only found so much on PunyCode so this could be of use to many. Enjoy Bergi! :) – BGPHiJACK Dec 28 '21 at 22:10