Below is a scenario I would like to prevent, it's proper behavior by standards but to users it's scary and unsafe. As a programmer I see the unicode and run away but users may not who don't think about this.
Unaware of the best way to ASCII convert it and check for xn-- for fast rejection; I am inquiring if there's any lingo that could get me on my way and possibly help others.
I've tried so far, the punycode function available online no understanding of it just yet but the link being here:
Converting punycode with dash character to Unicode
I was unsuccessful with getting this work immediately without xn-- or - appearing in the normal links passed through linkify.
Goal is to check after linkify process for punycode and reject it. Any other link is AOK avoiding as many false positives as possible.
The link below is a demonstration, don't actually click it. Highlight it and see what I mean.
let linkify = (text) => {
return text.replace(/(?:(?:(?:https?|ftps?):)?\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z0-9\u00a1-\uffff][a-z0-9\u00a1-\uffff_-]{0,62})?[a-z0-9\u00a1-\uffff]\.)+(?:[a-z\u00a1-\uffff]{2,}\.?))(?::\d{2,5})?(?:[/?#]\S*)?/igm, (url) => {
return '<a target="_blank" href="' + url + '">' + url + '</a>'
});
};
document.querySelector(".linkarea").innerHTML = linkify("HACKER LINK: http://www.yȯutube.com/") + " url is unicode but the actual link is http://www.xn--yutube-iqc.com/ which tricks people";
<div class="linkarea">
<div>