C#, extracting strings using regex or string splitting

Question

After reading the answers from this question: C# regex pattern to extract urls from given string - not full html urls but bare links as well I want to know which would be the fastest way to extract urls from a document, by using regex matching or by using string split method.

So, you have a string containing an html document and want to extract urls.

The regex way would be:

Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
foreach(Match m in linkParser.Matches(rawString))
    MessageBox.Show(m.Value);

And the string split method:

string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));
foreach (string s in links)
    MessageBox.Show(s);

Which one is the most performant way to do it?

I'm ashamed to admit that my first thought was "is Stopwatch some type of benchmark program" — Alex Krupka, Aug 05 '16 at 14:09
I can't benchmark as I don't have accesss to a PC for a few days. — Vlad Radu, Aug 05 '16 at 14:14

score 0 · Accepted Answer · answered Aug 05 '16 at 15:07

Split is faster. Here is some code that you can test with: dotnetfiddle link

using System;
using System.Diagnostics;
using System.Linq;
using System.Text.RegularExpressions;

public class Program
{

    public void Main()
    {
        Stopwatch sw = new Stopwatch();

        sw.Start();

        for (int i=0; i < 500; i++)
        {
            Regex linkParser = new Regex(@"\b(?:https?://|www\.)\S+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
        }

        sw.Stop();

        var test1Time = sw.ElapsedMilliseconds;


        sw.Reset();
        sw.Start();

        for (int i=0; i < 500; i++)
        {
            string rawString = "house home go www.monstermmorpg.com nice hospital http://www.monstermmorpg.com this is incorrect url http://www.monstermmorpg.commerged continue";
            var links = rawString.Split("\t\n ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries).Where(s => s.StartsWith("http://") || s.StartsWith("www.") || s.StartsWith("https://"));  
        }

        sw.Stop();

        var test2Time = sw.ElapsedMilliseconds;

        Console.WriteLine("Regex Test: " + test1Time.ToString());
        Console.WriteLine("Split Test: " + test2Time.ToString());
    }
}

Wonderful. Thanks for answering, – Vlad Radu Aug 05 '16 at 15:14 — Vlad Radu, Aug 05 '16 at 15:14
How about checking it as the answer. – Aug 05 '16 at 15:18 — , Aug 05 '16 at 15:18

C#, extracting strings using regex or string splitting

1 Answers1