1

How can I convert a url to html link from text using Html Agility Pack + c#?

For example: "www.stackoverflow.com is a very cool site."

Output:

"<a href="www.stackoverflow.com">www.stackoverflow.com</a>  is a very cool site."
Thabiso Mofokeng
  • 681
  • 9
  • 20
  • 1
    _Did you tried anything?_ Show your effort first.. – Soner Gönül Mar 20 '13 at 14:53
  • I have tried using regex and it works but I wanted to try Html Agility pack if it can do this. I've done lot of research about using Html Agility pack and haven't found any solution yet. – Thabiso Mofokeng Mar 20 '13 at 15:12
  • Html Agility Pack works on HTML string. The string you give is not a full HTML fragment. Also, are quotes important/significant in your example? a piece of real c# code would help. – Simon Mourier Mar 20 '13 at 15:24
  • Can we just avoid making this a chain of comments. If I was struggling to achieve what I know how with Html agility pack I would've supplied the piece of code. All I want is a whether anyone has tried this and can confirm whether it's possible or not. Why a +1 to @SonerGönül? – Thabiso Mofokeng Mar 20 '13 at 21:43

2 Answers2

3

Thanks @user1778606 for your answer. I got this working though it still uses a bit of Regex. It works much better and safer (i.e. it will never create hyperlinks within hyperlinks and the href attribute).

        //convert text to html
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(inputString);

        // \w* - means it can start with any alphanumeric charactar
        // \s+ - was placed to replace all white spaces (when there is more than one word).
        // \b - set bounderies for the keyword
        const string pattern = @"((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)";

        //get all elements text propery except for anchor element 
        var nodes = doc.DocumentNode.SelectNodes("//text()[not(ancestor::a)]") ?? new HtmlAgilityPack.HtmlNodeCollection(null);

        foreach (var node in nodes)
        {
            Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
            node.InnerHtml = regex.Replace(node.InnerHtml, "<a href=\"$1\">$1</a>").Replace("href=\"www", "href=\"http://www");
        }

        return doc.DocumentNode.OuterHtml;
Thabiso Mofokeng
  • 681
  • 9
  • 20
0

I'm pretty sure its possible, although I haven't attempted it.

Here's how to replace a fixed string in a document with links

Find keyword in text when keyword match certain conditions - C#

Heres how to regex for urls

regular expression for url

Put those together and it should be possible.

Pseudocode

select all text nodes

for each node

get the inner text
find urls in the text (use regex?)
for each url found

replace the text of the url with string literal link tag (a href = etc ...)

Community
  • 1
  • 1
monkeyhouse
  • 2,875
  • 3
  • 27
  • 42