14

I have a string that may have whitespace characters around it and I want to check to see whether it is essentially empty.

There are quite a few ways to do this:

1  if (myString.Trim().Length == 0)
2  if (myString.Trim() == "")
3  if (myString.Trim().Equals(""))
4  if (myString.Trim() == String.Empty)
5  if (myString.Trim().Equals(String.Empty))

I'm aware that this would usually be a clear case of premature optimization, but I'm curious and there's a chance that this will be done enough to have a performance impact.

So which of these is the most efficient method?

Are there any better methods I haven't thought of?


Edit: Notes for visitors to this question:

  1. There have been some amazingly detailed investigations into this question - particularly from Andy and Jon Skeet.

  2. If you've stumbled across the question while searching for something, it's well worth your while reading at least Andy's and Jon's posts in their entirety.

It seems that there are a few very efficient methods and the most efficient depends on the contents of the strings I need to deal with.

If I can't predict the strings (which I can't in my case), Jon's IsEmptyOrWhiteSpace methods seem to be faster generally.

Thanks all for your input. I'm going to select Andy's answer as the "correct" one simply because he deserves the reputation boost for the effort he put in and Jon has like eleventy-billion reputation already.

Damovisa
  • 19,213
  • 14
  • 66
  • 88

8 Answers8

19

Edit: New tests:

Test orders:
x. Test name
Ticks: xxxxx //Empty String
Ticks: xxxxx //two space
Ticks: xxxxx //single letter
Ticks: xxxxx //single letter with space
Ticks: xxxxx //long string
Ticks: xxxxx //long string  with space

1. if (myString.Trim().Length == 0)
ticks: 4121800
ticks: 7523992
ticks: 17655496
ticks: 29312608
ticks: 17302880
ticks: 38160224

2.  if (myString.Trim() == "")
ticks: 4862312
ticks: 8436560
ticks: 21833776
ticks: 32822200
ticks: 21655224
ticks: 42358016


3.  if (myString.Trim().Equals(""))
ticks: 5358744
ticks: 9336728
ticks: 18807512
ticks: 30340392
ticks: 18598608
ticks: 39978008


4.  if (myString.Trim() == String.Empty)
ticks: 4848368
ticks: 8306312
ticks: 21552736
ticks: 32081168
ticks: 21486048
ticks: 41667608


5.  if (myString.Trim().Equals(String.Empty))
ticks: 5372720
ticks: 9263696
ticks: 18677728
ticks: 29634320
ticks: 18551904
ticks: 40183768


6.  if (IsEmptyOrWhitespace(myString))  //See John Skeet's Post for algorithm
ticks: 6597776
ticks: 9988304
ticks: 7855664
ticks: 7826296
ticks: 7885200
ticks: 7872776

7. is (string.IsNullOrEmpty(myString.Trim())  //Cloud's suggestion
ticks: 4302232
ticks: 10200344
ticks: 18425416
ticks: 29490544
ticks: 17800136
ticks: 38161368

And the code used:

public void Main()
{

    string res = string.Empty;

    for (int j = 0; j <= 5; j++) {

        string myString = "";

        switch (j) {

            case 0:
                myString = "";
                break;
            case 1:
                myString = "  ";
                break;
            case 2:
                myString = "x";
                break;
            case 3:
                myString = "x ";
                break;
            case 4:
                myString = "this is a long string for testing triming empty things.";
                break;
            case 5:
                myString = "this is a long string for testing triming empty things. ";

                break;
        }

        bool result = false;
        Stopwatch sw = new Stopwatch();

        sw.Start();
        for (int i = 0; i <= 100000; i++) {


            result = myString.Trim().Length == 0;
        }
        sw.Stop();


        res += "ticks: " + sw.ElapsedTicks + Environment.NewLine;
    }


    Console.ReadKey();  //break point here to get the results
}
Pondidum
  • 11,457
  • 8
  • 50
  • 69
15

(EDIT: See bottom of post for benchmarks on different micro-optimizations of the method)

Don't trim it - that might create a new string which you don't actually need. Instead, look through the string for any characters that aren't whitespace (for whatever definition you want). For example:

public static bool IsEmptyOrWhitespace(string text)
{
    // Avoid creating iterator for trivial case
    if (text.Length == 0)
    {
        return true;
    }
    foreach (char c in text)
    {
        // Could use Char.IsWhiteSpace(c) instead
        if (c==' ' || c=='\t' || c=='\r' || c=='\n')
        {
            continue;
        }
        return false;
    }
    return true;
}

You might also consider what you want the method to do if text is null.

Possible further micro-optimizations to experiment with:

  • Is foreach faster or slower than using a for loop like the one below? Note that with the for loop you can remove the "if (text.Length==0)" test at the start.

    for (int i = 0; i < text.Length; i++)
    {
        char c = text[i];
        // ...
    
  • Same as above, but hoisting the Length call. Note that this isn't good for normal arrays, but might be useful for strings. I haven't tested it.

    int length = text.Length;
    for (int i = 0; i < length; i++)
    {
        char c = text[i];
    
  • In the body of the loop, is there any difference (in speed) between what we've got and:

    if (c != ' ' && c != '\t' && c != '\r' && c != '\n')
    {
        return false;
    }
    
  • Would a switch/case be faster?

    switch (c)
    {
        case ' ': case '\r': case '\n': case '\t':
            return false;               
    }
    

Update on Trim behaviour

I've just been looking into how Trim can be as efficient as this. It seems that Trim will only create a new string if it needs to. If it can return this or "" it will:

using System;

class Test
{
    static void Main()
    {
        CheckTrim(string.Copy(""));
        CheckTrim("  ");
        CheckTrim(" x ");
        CheckTrim("xx");
    }

    static void CheckTrim(string text)
    {
        string trimmed = text.Trim();
        Console.WriteLine ("Text: '{0}'", text);
        Console.WriteLine ("Trimmed ref == text? {0}",
                          object.ReferenceEquals(text, trimmed));
        Console.WriteLine ("Trimmed ref == \"\"? {0}",
                          object.ReferenceEquals("", trimmed));
        Console.WriteLine();
    }
}

This means it's really important that any benchmarks in this question should use a mixture of data:

  • Empty string
  • Whitespace
  • Whitespace surrounding text
  • Text without whitespace

Of course, the "real world" balance between these four is impossible to predict...

Benchmarks I've run some benchmarks of the original suggestions vs mine, and mine appears to win in everything I throw at it, which surprises me given the results in other answers. However, I've also benchmarked the difference between foreach, for using text.Length, for using text.Length once and then reversing the iteration order, and for with a hoisted length.

Basically the for loop is very slightly faster, but hoisting the length check makes it slower than foreach. Reversing the for loop direction is very slightly slower than foreach too. I strongly suspect that the JIT is doing interesting things here, in terms of removing duplicate bounds checks etc.

Code: (see my benchmarking blog entry for the framework this is written against)

using System;
using BenchmarkHelper;

public class TrimStrings
{
    static void Main()
    {
        Test("");
        Test(" ");
        Test(" x ");
        Test("x");
        Test(new string('x', 1000));
        Test(" " + new string('x', 1000) + " ");
        Test(new string(' ', 1000));
    }

    static void Test(string text)
    {
        bool expectedResult = text.Trim().Length == 0;
        string title = string.Format("Length={0}, result={1}", text.Length, 
                                     expectedResult);

        var results = TestSuite.Create(title, text, expectedResult)
/*            .Add(x => x.Trim().Length == 0, "Trim().Length == 0")
            .Add(x => x.Trim() == "", "Trim() == \"\"")
            .Add(x => x.Trim().Equals(""), "Trim().Equals(\"\")")
            .Add(x => x.Trim() == string.Empty, "Trim() == string.Empty")
            .Add(x => x.Trim().Equals(string.Empty), "Trim().Equals(string.Empty)")
*/
            .Add(OriginalIsEmptyOrWhitespace)
            .Add(IsEmptyOrWhitespaceForLoop)
            .Add(IsEmptyOrWhitespaceForLoopReversed)
            .Add(IsEmptyOrWhitespaceForLoopHoistedLength)
            .RunTests()                          
            .ScaleByBest(ScalingMode.VaryDuration);

        results.Display(ResultColumns.NameAndDuration | ResultColumns.Score,
                        results.FindBest());
    }

    public static bool OriginalIsEmptyOrWhitespace(string text)
    {
        if (text.Length == 0)
        {
            return true;
        }
        foreach (char c in text)
        {
            if (c==' ' || c=='\t' || c=='\r' || c=='\n')
            {
                continue;
            }
            return false;
        }
        return true;
    }

    public static bool IsEmptyOrWhitespaceForLoop(string text)
    {
        for (int i=0; i < text.Length; i++)
        {
            char c = text[i];
            if (c==' ' || c=='\t' || c=='\r' || c=='\n')
            {
                continue;
            }
            return false;
        }
        return true;
    }

    public static bool IsEmptyOrWhitespaceForLoopReversed(string text)
    {
        for (int i=text.Length-1; i >= 0; i--)
        {
            char c = text[i];
            if (c==' ' || c=='\t' || c=='\r' || c=='\n')
            {
                continue;
            }
            return false;
        }
        return true;
    }

    public static bool IsEmptyOrWhitespaceForLoopHoistedLength(string text)
    {
        int length = text.Length;
        for (int i=0; i < length; i++)
        {
            char c = text[i];
            if (c==' ' || c=='\t' || c=='\r' || c=='\n')
            {
                continue;
            }
            return false;
        }
        return true;
    }
}

Results:

============ Length=0, result=True ============
OriginalIsEmptyOrWhitespace             30.012 1.00
IsEmptyOrWhitespaceForLoop              30.802 1.03
IsEmptyOrWhitespaceForLoopReversed      32.944 1.10
IsEmptyOrWhitespaceForLoopHoistedLength 35.113 1.17

============ Length=1, result=True ============
OriginalIsEmptyOrWhitespace             31.150 1.04
IsEmptyOrWhitespaceForLoop              30.051 1.00
IsEmptyOrWhitespaceForLoopReversed      31.602 1.05
IsEmptyOrWhitespaceForLoopHoistedLength 33.383 1.11

============ Length=3, result=False ============
OriginalIsEmptyOrWhitespace             30.221 1.00
IsEmptyOrWhitespaceForLoop              30.131 1.00
IsEmptyOrWhitespaceForLoopReversed      34.502 1.15
IsEmptyOrWhitespaceForLoopHoistedLength 35.690 1.18

============ Length=1, result=False ============
OriginalIsEmptyOrWhitespace             31.626 1.05
IsEmptyOrWhitespaceForLoop              30.005 1.00
IsEmptyOrWhitespaceForLoopReversed      32.383 1.08
IsEmptyOrWhitespaceForLoopHoistedLength 33.666 1.12

============ Length=1000, result=False ============
OriginalIsEmptyOrWhitespace             30.177 1.00
IsEmptyOrWhitespaceForLoop              33.207 1.10
IsEmptyOrWhitespaceForLoopReversed      30.867 1.02
IsEmptyOrWhitespaceForLoopHoistedLength 31.837 1.06

============ Length=1002, result=False ============
OriginalIsEmptyOrWhitespace             30.217 1.01
IsEmptyOrWhitespaceForLoop              30.026 1.00
IsEmptyOrWhitespaceForLoopReversed      34.162 1.14
IsEmptyOrWhitespaceForLoopHoistedLength 34.860 1.16

============ Length=1000, result=True ============
OriginalIsEmptyOrWhitespace             30.303 1.01
IsEmptyOrWhitespaceForLoop              30.018 1.00
IsEmptyOrWhitespaceForLoopReversed      35.475 1.18
IsEmptyOrWhitespaceForLoopHoistedLength 40.927 1.36
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Will the foreach(char c in text) actually do an inline search or will it create a new array of chars? – Damovisa May 01 '09 at 07:04
  • It will create an appropriate IEnumerator associated with the string. It doesn't copy the string. – Jon Skeet May 01 '09 at 07:12
  • I've just tested this method with a million iterations - it's slightly slower than using Trim(). I guess you might save a bit down the line, though, through avoiding GC. – Joe Albahari May 01 '09 at 07:13
  • P.S. It might also depend on how long the (untrimmed) string was. – Joe Albahari May 01 '09 at 07:14
  • @Jon Thanks for that - very good to know. I've been blindly using .ToCharArray() – Damovisa May 01 '09 at 07:15
  • @albahari (are you one of Joe or Ben, btw?) That's completely bizarre. I can't see how it *could* be slower than Trim, unless it's the iterator killing it (see alternative options in edit). What is your test data like, out of interest? – Jon Skeet May 01 '09 at 07:18
  • Looping backwards through the string may also improve matters. I haven't tested this though. – Pondidum May 01 '09 at 07:21
  • Andy: Why? The whitespace could be at the start or the end. Why would it be better to go from the end? – Jon Skeet May 01 '09 at 07:23
  • I've updated the answer with more info about the behaviour of Trim, which is quite interesting and important... – Jon Skeet May 01 '09 at 07:29
  • @Jon: Trim uses substring internally to extract the copy of the "trimmed" string. Substring does not make a copy when asked to duplicate a string, it returns the source string instead. That said, I can't see a reason that "doing it yourself" could be slower that Trim - as long as you replace the foreach with a plain for. – Dan C. May 01 '09 at 07:35
  • @Jon - are you sure you need the "length if" when using foreach? - If the collection is empty the for each loop will just not execute and go to return true imediately - in this case your code is just bigger and if the collection is not empty it's slower. In general it feels this solution is complicated a bit when simplicity would be all we need - why don't we just foreach with a switch inside? Using a chain of && or != will give you exactly the same performance - in fact a good compiler should generate the same code in all 3 cases... – RnR May 01 '09 at 07:41
  • Also - suggesting the usage of text[i] as an optimization is strange - to get to the i'th element of a table you have to multiply i by the size of the elements and then add that to the begining of your table to get the address - even if the multiplication will be a left shift (the size is a power of 2) it's always slower then using an iterator approach and just incrementing the current address that foreach surely gives you. – RnR May 01 '09 at 07:44
  • @RnR: he just needs a for loop instead of the foreach, as suggested in the later edit. – Dan C. May 01 '09 at 07:45
  • @RnR: Using foreach involves creating an iterator. Why do that if we know the result already? This optimizes for the "empty" case of course, at the slight cost of the non-empty case. I don't know how much that affects things. – Jon Skeet May 01 '09 at 07:46
  • @RnR: Do you know what the string iterator and string indexer look like internally? Are you absolutely sure they differ significantly? :) – Jon Skeet May 01 '09 at 07:46
  • @DanC: Have you tested a for loop to ensure it's definitely faster? I haven't yet, although I will if I get the time... – Jon Skeet May 01 '09 at 07:47
  • Comparing with a compiled Regex might be interesting too. :) – leppie May 01 '09 at 07:54
  • @Jon: I haven't, it's common knowledge :) (just being ironic, I know one should test before making assumptions). I strongly expect the for to be faster; the foreach involves creating an enumerator object and the MoveNext call is just an indexer access - so in the end, it's just a complicated way of doing char c = myString[i]; – Dan C. May 01 '09 at 07:57
  • sad that you are missing an unsafe version :( – Chad Grant May 01 '09 at 08:01
  • @DanC: The difference is that the iterator version *could* potentially avoid doing the bounds check in both the loop and the indexer. There's enough room for doubt that I wouldn't like to say without testing. – Jon Skeet May 01 '09 at 08:13
  • @Deviant: True. I don't tend to think in terms of unsafe code. That may well squeeze out a bit more speed. – Jon Skeet May 01 '09 at 08:13
  • @Jon: it *could*, but it doesn't. MoveNext checks the string length on every call before getting the character from the string, just as a for would on every loop. But as you say, it should be tested. – Dan C. May 01 '09 at 08:19
  • I would love to be able to use all the tools i have at home, but i am somewhat restricted at work - hence my thrown-together-testing. – Pondidum May 01 '09 at 09:45
  • Depending on the length of the string it might be faster to check the string from the back and front, or i'm wrong? Just think about a string which has 50 whitespaces at beginning but one non-whitespace char at the end. – Felix K. Sep 29 '13 at 15:02
  • @FelixK.: But consider the opposite - a string which has 50 whitespaces at the end but one non-whitespace char at the beginning. It *may* be faster to alternate between start and end... although for a long string that could have dire cache effects, perhaps. – Jon Skeet Sep 29 '13 at 16:17
  • Thats what i thought, alternate between start and end for long strings. In worst case the time needed is the double of the time which is needed when you just start at the beginning. In best case it saves you the time to iterate though the complete string. – Felix K. Sep 29 '13 at 17:13
  • @FelixK.: Ah, I misunderstood you before. Right. – Jon Skeet Sep 29 '13 at 19:22
  • @FelixK.: But ultimately, in a string of length N, you're never going to do better than O(N) for the worst case, are you? – Jon Skeet Sep 29 '13 at 19:47
  • @Jon - I found a "good enough" solution to see if a char is whitespace with the check `c < 33` (or `c < '!'`), which is significantly faster. Now, < 33 doesn't necessarily mean a whitespace character, but it includes them plus a bunch of other characters (mostly control characters) that you probably want to ignore anyway. – JulianR Sep 29 '13 at 22:57
  • @JulianR: "Good enough" will depend on your context. It may be good enough for you but not for some others. It also doesn't include all Unicode whitespace characters. For example, `"\u00a0 \u00a0"` trims to an empty string in .NET... – Jon Skeet Sep 30 '13 at 05:50
  • @JonSkeet Thats right, but its the same when you iterate from the start only. But it's not unlikly that my idea is faster in the end because there are a lot of situations where you have only the whitespace at the beginning. – Felix K. Sep 30 '13 at 08:07
  • @FelixK.: Personally I've found it more common to have whitespace just at the end, to be honest. – Jon Skeet Sep 30 '13 at 08:27
4

I really don't know which is faster; although my gut feeling says number one. But here's another method:

if (String.IsNullOrEmpty(myString.Trim()))
pyrocumulus
  • 9,072
  • 2
  • 43
  • 53
4

myString.Trim().Length == 0 Took : 421 ms

myString.Trim() == '' took : 468 ms

if (myString.Trim().Equals("")) Took : 515 ms

if (myString.Trim() == String.Empty) Took : 484 ms

if (myString.Trim().Equals(String.Empty)) Took : 500 ms

if (string.IsNullOrEmpty(myString.Trim())) Took : 437 ms

In my tests, it looks like myString.Trim().Length == 0 and surprisingly, string.IsNullOrEmpty(myString.Trim()) were consistently the fastest. The results above are a typical result from doing 10,000,000 comparisons.

womp
  • 115,835
  • 26
  • 236
  • 269
  • @womp: string.IsNullOrEmpty checks the string Length too (besides doing the null check); in this situation, I would very much prefer checking it directly - option 1. – Dan C. May 01 '09 at 07:19
3

String.IsNullOrWhitespace in .NET 4 Beta 2 also plays in this space and doesnt need to be custom written

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249
  • 1
    This method was indeed added in .NET 4.0 (in 2010), see [`String.IsNullOrWhiteSpace` Method](http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx). – Jeppe Stig Nielsen Sep 29 '13 at 15:28
3

Checking the length of a string for being zero is the most efficient way to test for an empty string, so I would say number 1:

if (myString.Trim().Length == 0)

The only way to optimize this further might be to avoid trimming by using a compiled regular expression (Edit: this is actually much slower than using Trim().Length).

Edit: The suggestion to use Length came from a FxCop guideline. I've also just tested it: it's 2-3 times faster than comparing to an empty string. However both approaches are still extremely fast (we're talking nanoseconds) - so it hardly matters which one you use. Trimming is so much more of a bottleneck it's hundreds of times slower than the actual comparison at the end.

Joe Albahari
  • 30,118
  • 7
  • 80
  • 91
1

Since I just started I can't comment so here it is.

if (String.IsNullOrEmpty(myString.Trim()))

Trim() call will fail if myString is null since you can't call methods in a object that is null (NullReferenceException).

So the correct syntax would be something like this:

if (!String.IsNullOrEmpty(myString))
{
    string trimmedString = myString.Trim();
    //do the rest of you code
}
else
{
    //string is null or empty, don't bother processing it
}
glen3b
  • 693
  • 1
  • 8
  • 22
Oakcool
  • 1,470
  • 1
  • 15
  • 33
0
public static bool IsNullOrEmpty(this String str, bool checkTrimmed)
{
  var b = String.IsNullOrEmpty(str);
  return checkTrimmed ? b && str.Trim().Length == 0 : b;
}
abatishchev
  • 98,240
  • 88
  • 296
  • 433