2

I'm trying to create a wysiwyg editor. The goal is that when a user pastes or types in a link (I.E. paste or keyup(space) event), the editor will detect it in real time and discern if it's an image, video or something else.

I tried to work some libraries suggested in an answer for another question, but those insisted on making all url's links or caused other problems.

I was unsure as to what the best approach would be. I tried looping the contents of the input field, which I couldn't get to work with nested elements. So, instead I attempted converting the html contents into a string and then replacing links from that.

The problem is not matching a link, the internet is full of great regexes. But how do I only match links that are not inside an tag, or an attribute of another tag?

I tried adding a negative lookahead (?!(\</a>|"|') (which I know isn't the perfect solution) in the end of the string, but apparently that doesn't work like I thought it would. So I'm completely lost with this.

$(function(){
  document.write(searchLinks("Sample text https://www.google.fi/images/srpr/logo11w.png and http://google.com/ <a href='http://bing.com/'>http://bing.com/</a>"));
});

function searchLinks(string){
 var urlRegex =/\bhttps?:\/\/[a-zA-Z0-9()^=+*@&%#|~?!;:,.-_/]*[-A-Za-z0-9+&@#/%=~_()|](?!(\<\/a\>|"|'))/g;
 console.log(string.match(urlRegex));
 string=string.replace(urlRegex, function(url){
  if(url.match(/\.gifv/)!=null){ //gifv
   return gifvToVideo(url);
  }else if(url.match(/\.(jpeg|jpg|gif|png|svg)/)!=null){ //image
   return "<img src='"+url+"' alt='"+url+"'>";
  }else if(url.match(/\.(mp4|webm)/)!=null){ //video
   return '<video><source src="'+url+'"></video>';
  }else{ //link
   return '<a href="'+url+'" target="_blank">'+url+'</a>';
  }
 });
 return string;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Community
  • 1
  • 1
Okku
  • 7,468
  • 4
  • 30
  • 43
  • Using a regex to process html is a bad idea... it can always lead to unexpected results... – Arun P Johny May 18 '15 at 04:01
  • Care to elaborate or provide with links explaining why it is so? If not regexing it, how can I find and replace the links? Like I said, I'm running out of ideas as to how to approach this... – Okku May 18 '15 at 04:05
  • 1
    try http://jsfiddle.net/arunpjohny/5z12Lsgp/1/ - whether it solves your problem – Arun P Johny May 18 '15 at 04:08

2 Answers2

1

I think 1 option is to create a dom structure and iterate over only the top level text nodes like

function searchLinks(html) {
    var $tmp = $('<div />', {
        html: html
    });
    var urlRegex = /\bhttps?:\/\/[a-zA-Z0-9()^=+*@&%#|~?!;:,.-_\/]*[-A-Za-z0-9+&@#\/%=~_()|](?!(\<\/a\>|"|'))/g;
    $tmp.contents().each(function () {
        if (this.nodeType == Node.TEXT_NODE) {
            var string = this.nodeValue;

            string = string.replace(urlRegex, function (url) {
                if (url.match(/\.gifv/) != null) { //gifv
                    return gifvToVideo(url);
                } else if (url.match(/\.(jpeg|jpg|gif|png|svg)/) != null) { //image
                    return "<img src='" + url + "' alt='" + url + "'>";
                } else if (url.match(/\.(mp4|webm)/) != null) { //video
                    return '<video><source src="' + url + '"></video>';
                } else { //link
                    return '<a href="' + url + '" target="_blank">' + url + '</a>';
                }
            });

            $(this).replaceWith(string)
        }
    })

    return $tmp.html();
}

Demo: Fiddle

Arun P Johny
  • 384,651
  • 66
  • 527
  • 531
  • Thank-you! I added an else-statement, so that it will go though also child-nodes: [fiddle](http://jsfiddle.net/5z12Lsgp/4/) Now it seems to be working! – Okku May 18 '15 at 05:00
  • That's even better, that should've been obvious. Thanks again:) – Okku May 18 '15 at 05:13
0

Find URL's outside attributes


You may have links in other html elements too.

An option is to search links, which are not inside attributes. Code below isn't bulletproof, but on well formatted HTML this should work on most cases.

If you suspect that your HTML is not well formatted, tidy up it before using regex below.

PHP example:

preg_match_all( "/(?<!\"|')(http|https|ftp|ftps)\\:\\/\\/[a-zA-Z0-9\\-\\.]+\\.[a-zA-Z]{2,3}(\\/\\S*)?/", $srcText, $rgxMatches) ;

Regex:

(?<!"|')(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?
D.A.H
  • 858
  • 2
  • 9
  • 19