1

I have a very good regex that works and is able to replace urls in a string to clickable once.

string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";

Now, how can I tell it to ignore already clickable links and images?

So it ignores below strings:

<a href="http://www.someaddress.com">Some Text</a>

<img src="http://www.someaddress.com/someimage.jpg" />

Example:

The website www.google.com, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" />

Result:

The website <a href="http://www.google.com">www.google.com</a>, once again <a href="http://www.google.com">www.google.com</a>, the logo <img src="http://www.google.com/images/logo.gif" />

Full HTML Parser code:

string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";
Regex r = new Regex(regex, RegexOptions.IgnoreCase);

text = r.Replace(text, "<a href=\"$1\" title=\"Click to open in a new window or tab\" target=\"&#95;blank\" rel=\"nofollow\">$1</a>").Replace("href=\"www", "href=\"http://www");

return text;
Cindro
  • 1,055
  • 4
  • 12
  • 23

2 Answers2

2

First I'll post it the obligatory link if no one else will. RegEx match open tags except XHTML self-contained tags


How about using a negative lookahead/behind for " like this:

string regex = @"(?<!"")((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])(?!"")";

Community
  • 1
  • 1
Sam Greenhalgh
  • 5,952
  • 21
  • 37
1

Check out: Detect email in text using regex, just replace the regex for links, it will never replace a link inside a tag, only in contents.

http://htmlagilitypack.codeplex.com/

Something like:


string textToBeLinkified = "... your text here ...";
const string regex = @"((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:@=.+?,##%&amp;~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])";
Regex urlExpression = new Regex(regex, RegexOptions.IgnoreCase | RegexOptions.ExplicitCapture);

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(textToBeLinkified);

var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]") ?? new HtmlNodeCollection();
foreach (var node in nodes)
{
    node.InnerHtml = urlExpression.Replace(node.InnerHtml, @"<a href=""$0"">$0</a>");
}
string linkifiedText = doc.DocumentNode.OuterHtml;
Community
  • 1
  • 1
jessehouwing
  • 106,458
  • 22
  • 256
  • 341