C# code to linkify urls in a string

Question

Does anyone have any good c# code (and regular expressions) that will parse a string and "linkify" any urls that may be in the string?

This seems to be the question with the canonical regular expression-based solution. Perhaps somebody could edit the title to help searchers find it? — JasonSmith, Oct 07 '09 at 11:44

score 47 · Accepted Answer · edited Oct 21 '18 at 05:39

47

It's a pretty simple task you can acheive it with Regex and a ready-to-go regular expression from:

http://regexlib.com/

Something like:

var html = Regex.Replace(html, @"^(http|https|ftp)\://[a-zA-Z0-9\-\.]+" +
                         "\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?" +
                         "([a-zA-Z0-9\-\._\?\,\'/\\\+&amp;%\$#\=~])*$",
                         "<a href=\"$1\">$1</a>");

You may also be interested not only in creating links but in shortening URLs. Here is a good article on this subject:

Resolve and shorten URLs in C#

See also:

edited Oct 21 '18 at 05:39

Cœur

37,241
25
195
267

answered Apr 16 '09 at 21:30

Konstantin Tarkus

37,618
14
135
121

2

Hi. Great response. Most of the suggestions in your post (and links) seem to work but they all seem to break any existing links in the text being evaluated. – BeYourOwnGod Apr 16 '09 at 22:01
1

VSmith you can try different reg expressions from regixlib.com and find which one works best for you. – Konstantin Tarkus Apr 16 '09 at 22:04
@VSmith: Are you implying that you have a string like "hello there, see: http://www.b.com"; and you only want to linkify the second one? – Zhaph - Ben Duguid Apr 16 '09 at 22:23
2

hmm, that worked well. thus proving all the points we're making here ;) – Zhaph - Ben Duguid Apr 16 '09 at 22:24
Hi Zhaph, yes thats definitely what I want to do. stackoverflow seems to have great "linkifying" code doesnt it? ;-) – BeYourOwnGod Apr 16 '09 at 22:51
10

Great stuff! However, to work with the .Net Regex (System.Text.RegularExpressions.Regex), and to use url's that are in the middle of text lines I needed to modify the code like this: Regex.Replace(answer, @"((http|https|ftp)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*)", @"$1"); – Marcel Aug 10 '12 at 13:44
3

I used the following regular expression to account for hostnames which are not fully-qualified: `@"((http|https|ftp)\://[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(:[a-zA-Z0-9]*)?/?([a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~])*)"` – cubetwo1729 Apr 01 '13 at 16:25
This will not work on international urls, not containing all ASCII. Ie. http://www.rødt.no – Rune Nov 29 '16 at 09:52

score 14 · Answer 2 · answered Jul 10 '09 at 21:48

well, after a lot of research on this, and several attempts to fix times when

people enter in http://www.sitename.com and www.sitename.com in the same post
fixes to parenthisis like (http://www.sitename.com) and http://msdn.microsoft.com/en-us/library/aa752574(vs.85).aspx
long urls like: http://www.amazon.com/gp/product/b000ads62g/ref=s9_simz_gw_s3_p74_t1?pf_rd_m=atvpdkikx0der&pf_rd_s=center-2&pf_rd_r=04eezfszazqzs8xfm9yd&pf_rd_t=101&pf_rd_p=470938631&pf_rd_i=507846

we are now using this HtmlHelper extension... thought I would share and get any comments:

    private static Regex regExHttpLinks = new Regex(@"(?<=\()\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\))|(?<=(?<wrap>[=~|_#]))\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|](?=\k<wrap>)|\b(https?://|www\.)[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]", RegexOptions.Compiled | RegexOptions.IgnoreCase);

    public static string Format(this HtmlHelper htmlHelper, string html)
    {
        if (string.IsNullOrEmpty(html))
        {
            return html;
        }

        html = htmlHelper.Encode(html);
        html = html.Replace(Environment.NewLine, "<br />");

        // replace periods on numeric values that appear to be valid domain names
        var periodReplacement = "[[[replace:period]]]";
        html = Regex.Replace(html, @"(?<=\d)\.(?=\d)", periodReplacement);

        // create links for matches
        var linkMatches = regExHttpLinks.Matches(html);
        for (int i = 0; i < linkMatches.Count; i++)
        {
            var temp = linkMatches[i].ToString();

            if (!temp.Contains("://"))
            {
                temp = "http://" + temp;
            }

            html = html.Replace(linkMatches[i].ToString(), String.Format("<a href=\"{0}\" title=\"{0}\">{1}</a>", temp.Replace(".", periodReplacement).ToLower(), linkMatches[i].ToString().Replace(".", periodReplacement)));
        }

        // Clear out period replacement
        html = html.Replace(periodReplacement, ".");

        return html;
    }

score 8 · Answer 3 · answered Apr 16 '09 at 23:02

protected string Linkify( string SearchText ) {
    // this will find links like:
    // http://www.mysite.com
    // as well as any links with other characters directly in front of it like:
    // href="http://www.mysite.com"
    // you can then use your own logic to determine which links to linkify
    Regex regx = new Regex( @"\b(((\S+)?)(@|mailto\:|(news|(ht|f)tp(s?))\://)\S+)\b", RegexOptions.IgnoreCase );
    SearchText = SearchText.Replace( "&nbsp;", " " );
    MatchCollection matches = regx.Matches( SearchText );

    foreach ( Match match in matches ) {
        if ( match.Value.StartsWith( "http" ) ) { // if it starts with anything else then dont linkify -- may already be linked!
            SearchText = SearchText.Replace( match.Value, "<a href='" + match.Value + "'>" + match.Value + "</a>" );
        }
    }

    return SearchText;
}

We ended up using something very similar, with one modification. We ended up making sure the replacement only occurs once. This means we'll end up missing some links (links that occur more than once) but removes the possibility of garbled links in two cases: 1) When there are two links where one is more detailed than the other. e.g. "http://google.com http://google.com/reader" 2) When there is a mix of HTML links with plain text links. e.g. "http://google.com Google" if (input.IndexOf(match.Value) == input.LastIndexOf(match.Value)) { ... } — Michael Krauklis, Aug 01 '11 at 18:53

score 4 · Answer 4 · edited Sep 23 '15 at 13:19

4

It's not that easy as you can read in this blog post by Jeff Atwood. It's especially hard to detect where an URL ends.

For example, is the trailing parenthesis part of the URL or not:

http://en.wikipedia.org/wiki/PCTools(CentralPointSoftware)
an URL in parentheses (http://en.wikipedia.org) more text

In the first case, the parentheses are part of the URL. In the second case they are not!

edited Sep 23 '15 at 13:19

Brian

25,523
18
82
173

answered Apr 16 '09 at 21:45

M4N

94,805
45
217
260

1

And as you can see from the linkified URLs in this answer, not everyone gets it right :) – Ray Apr 16 '09 at 21:54
Well in fact, I didn't want the two URLs to be linkified. But it seems this is not supported. – M4N Apr 16 '09 at 21:57
Jeff's regex seems to display badly in my browser, I believe it should be: "\(?\bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]" – Zhaph - Ben Duguid Apr 16 '09 at 22:00

score 1 · Answer 5 · answered Feb 25 '15 at 12:51

There is class:

public class TextLink
{
    #region Properties

    public const string BeginPattern = "((http|https)://)?(www.)?";

    public const string MiddlePattern = @"([a-z0-9\-]*\.)+[a-z]+(:[0-9]+)?";

    public const string EndPattern = @"(/\S*)?";

    public static string Pattern { get { return BeginPattern + MiddlePattern + EndPattern; } }

    public static string ExactPattern { get { return string.Format("^{0}$", Pattern); } }

    public string OriginalInput { get; private set; }

    public bool Valid { get; private set; }

    private bool _isHttps;

    private string _readyLink;

    #endregion

    #region Constructor

    public TextLink(string input)
    {
        this.OriginalInput = input;

        var text = Regex.Replace(input, @"(^\s)|(\s$)", "", RegexOptions.IgnoreCase);

        Valid = Regex.IsMatch(text, ExactPattern);

        if (Valid)
        {
            _isHttps = Regex.IsMatch(text, "^https:", RegexOptions.IgnoreCase);
            // clear begin:
            _readyLink = Regex.Replace(text, BeginPattern, "", RegexOptions.IgnoreCase);
            // HTTPS
            if (_isHttps)
            {
                _readyLink = "https://www." + _readyLink;
            }
            // Default
            else
            {
                _readyLink = "http://www." + _readyLink;
            }
        }
    }

    #endregion

    #region Methods

    public override string ToString()
    {
        return _readyLink;
    }

    #endregion
}

Use it in this method:

public static string ReplaceUrls(string input)
{
    var result = Regex.Replace(input.ToSafeString(), TextLink.Pattern, match =>
    {
        var textLink = new TextLink(match.Value);
        return textLink.Valid ?
            string.Format("<a href=\"{0}\" target=\"_blank\">{1}</a>", textLink, textLink.OriginalInput) :
            textLink.OriginalInput;
    });
    return result;
}

Test cases:

[TestMethod]
public void RegexUtil_TextLink_Parsing()
{
    Assert.IsTrue(new TextLink("smthing.com").Valid);
    Assert.IsTrue(new TextLink("www.smthing.com/").Valid);
    Assert.IsTrue(new TextLink("http://smthing.com").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com/").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com/publisher").Valid);

    // port
    Assert.IsTrue(new TextLink("http://www.smthing.com:80").Valid);
    Assert.IsTrue(new TextLink("http://www.smthing.com:80/").Valid);
    // https
    Assert.IsTrue(new TextLink("https://smthing.com").Valid);

    Assert.IsFalse(new TextLink("").Valid);
    Assert.IsFalse(new TextLink("smthing.com.").Valid);
    Assert.IsFalse(new TextLink("smthing.com-").Valid);
}

[TestMethod]
public void RegexUtil_TextLink_ToString()
{
    // default
    Assert.AreEqual("http://www.smthing.com", new TextLink("smthing.com").ToString());
    Assert.AreEqual("http://www.smthing.com", new TextLink("http://www.smthing.com").ToString());
    Assert.AreEqual("http://www.smthing.com/", new TextLink("smthing.com/").ToString());

    Assert.AreEqual("https://www.smthing.com", new TextLink("https://www.smthing.com").ToString());
}

This works well, however it matches on things like o.context, or other string that have a period in them. Would be nice to force .com/.org/.net etc, somewhere in the string — Todd Horst, Dec 15 '15 at 20:30

score 1 · Answer 6 · edited May 23 '17 at 10:31

1

Have found following regular expression http://daringfireball.net/2010/07/improved_regex_for_matching_urls

for me looks very good. Jeff Atwood solution doesn't handle many cases. josefresno seem to me handle all cases. But when I have tried to understand it (in case of any support requests) my brain was boiled.

edited May 23 '17 at 10:31

Community

1
1

answered Oct 18 '11 at 14:19

Yauhen.F

2,382
3
19
25

score 0 · Answer 7 · answered Apr 11 '19 at 12:03

0

This works for me:

str = Regex.Replace(str,
                @"((http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)",
                "<a target='_blank' href='$1'>$1</a>");

answered Apr 11 '19 at 12:03

Muhammad Awais

4,238
1
42
37

C# code to linkify urls in a string

7 Answers7

Linked

Related