16

If I have a delimited text file with a basic delimiter (say | for instance) does it make a difference whether I use a String or a Regex split?

Would I see any performance gains with one versus the other?

I am assuming you would want to use Regex.Split if you have escaped delimiters that you don't want to split on (\| for example).

Are there any other reasons to use Regex.Split vs String.Split?

gotqn
  • 42,737
  • 46
  • 157
  • 243
Abe Miessler
  • 82,532
  • 99
  • 305
  • 486
  • 1
    If speed is important then benchmark the two options. I'd expect String.Split would be more performant, then again, it's unlikely to matter unless you're dealing with large volumes of data. – Will A Aug 30 '10 at 14:53

4 Answers4

14

Regex.Split is more capable, but for an arrangement with basic delimitting (using a character that will not exist anywhere else in the string), the String.Split function is much easier to work with.

As far as performance goes, you would have to create a test and try it out. But, don't pre-optimize, unless you know that this function will be the bottleneck for some essential process.

John Fisher
  • 22,355
  • 2
  • 39
  • 64
9

It seems that for simple scenarios string.Split() would work much better. I ran a test in Benchmark .NET

Tested on .NetCore 2.2.6:


Method Mean Error StdDev Median
RegexSplit 486.47 ns 9.769 ns 24.15 ns 481.72 ns
Split 84.76 ns 4.503 ns 13.21 ns 81.12 ns

Tested on .Net 5.0.101:


Method Mean Error StdDev
RegexSplit 182.10 ns 2.091 ns 1.956 ns
Split 50.29 ns 0.709 ns 0.663 ns

note: not run on the same hardware, so the relative differences in performance between dotnet versions is more important than the absolute differences.

The Test:

public class RegexVsSplit
{
    private readonly string data = "host:7000";

    public RegexVsSplit()
    {
    }

    [Benchmark]
    public string[] RegexSplit() => Regex.Split(data, ":");

    [Benchmark]
    public string[] Split() => data.Split(':');
}
Tjaart
  • 3,912
  • 2
  • 37
  • 61
  • 3
    For the future readers, please note that the new .Net Core / .NET 5.0 Preview 6 has improved the performance of RegEx (among many other areas) so you better do another benchmark like @Tjaart did. – Reza Jul 23 '20 at 11:23
  • Thanks for the tip. I reran the test and there is a big difference, but `string.Split` still seems to be the winner. – Tjaart Feb 19 '21 at 11:31
  • 1
    Another test might also be in order because you are recompiling the regex each time. It should likely also be tested with: private static readonly Regex colonRegex = new Regex(@":", RegexOptions.Compiled); – Chris Welton Oct 08 '21 at 07:48
6

By default I would reach for String.Split unless you have some complicated requirements that a regex would enable you to navigate around. Of course, as others have mentioned, profile it for your needs. Be sure to profile with and without RegexOptions.Compiled too and understand how it works. Look at To Compile or Not To Compile, How does RegexOptions.Compiled work?, and search for other articles on the topic.

One benefit of String.Split is its StringSplitOptions.RemoveEmptyEntries that removes empty results for cases where no data exists between delimiters. A regex pattern of the same split string/char would have excess empty entries. It's minor and can be handled by a simple LINQ query to filter out String.Empty results.

That said, a regex makes it extremely easy to include the delimiter if you have a need to do so. This is achieved by adding parentheses () around the pattern to make it a capturing group. For example:

string input = "a|b|c|d|e|f";
foreach (var s in Regex.Split(input, @"\|"))
    Console.WriteLine(s);

Console.WriteLine("Include delimiter...");
// notice () around pattern
foreach (var s in Regex.Split(input, @"(\|)"))
    Console.WriteLine(s);

You might find this question helpful as well: How do I split a string by strings and include the delimiters using .NET?

Community
  • 1
  • 1
Ahmad Mageed
  • 94,561
  • 19
  • 163
  • 174
1
  1. For simple seperator, you should use String.Split, for example comma seperated email addresses.
  2. For complex seperator (use Regex), like if you have seperator in quotes, it should not be seperated, for example A,B = Two tokens , A & B "A,B" = One token, ignore comma inside quotes
  3. To include delimiters as suggested by Ahmad

Which one will work faster it is very subjective. Regex will work faster in execution, however Regex's compile time and setup time will be more in instance creation. But if you keep your regex object ready in the beginning, reusing same regex to do split will be faster.

String.Split does not need any setup time, but it is pure sequential search operation, it will work slower for big text.

Akash Kava
  • 39,066
  • 20
  • 121
  • 167
  • Akash Kava, under what circumstances would Regex.Split be faster than String.Split? How can Regex possibly split faster than O(n)? (And why can't I refer to your name with '@'?) – agentnega Jun 25 '12 at 22:07
  • @agentnega Regex is supposed to use DFA http://en.m.wikipedia.org/wiki/Deterministic_finite_automaton which will work faster than multiple sequential scanning. If Regex does not use DFA then it will be slower, but I guess that depends upon platform. I don't know about referring my name with @, I have seen it happening with other users as well, because there is a space in my display name. – Akash Kava Jun 26 '12 at 06:47
  • 1
    I have clarified that in simple separators, string.split is faster, but complex separators which might involve look ahead, Regex is only option. – Akash Kava Jun 26 '12 at 06:51
  • @agentnega by default, comments on an answer are responses to the author so explicitly mentioning the author is redundant (and elided by SO's system). – iheanyi Mar 22 '22 at 16:02