0

(My apologies, this is a 2nd post to Most efficient way to determine if a string length != 0? but I can't figure out how to reply to people's answers, my reply becomes posted as an 'answer')

Ideally, what I'm looking for is the most efficient algorithm to do the following (which will be called 100million+ times). I'm using C# 4.0

Turn the string: "A B C D E " into the array: string["A","B","C","D","E"]

My algorithm is as follows:

public string[] SplitOnMultiSpaces(string text)
{
  if (string.IsNullOrEmpty(text)) return new string[0];

  var split = text.Split(' ');
  int length = split.Length;

  var data = new string[length];

  int index = 0;
  for (int i = 0; i<length; i++)
  {
    if (split[i].Length != 0)
    {
      data[index++] = split[i];
    }
  }

  return data;
}

My problem is when I profile this against 100,000 strings, it takes 1.04 seconds to execute.

If I comment out the "if (split[i].Length != 0)" check, it takes only 0.2 seconds.

Can anybody tell me why this (simple) query against the string is taking 80% of the TOTAL execution time? (Especially, since I expected other areas to use more CPU) The only idea I've come up w/ is C# is trying to count the string length, which people tell me is not the case (that it's more like VB strings I guess?). But that wouldn't make sense for the time overhead.

I've considered trying to see if split[i][0] exists, but relying on an exception slows things WAAAAAAY down.

P.S. -- My algorithm also suffers in that the returned array is, more often than not, bigger than it needs to be, but that doesn't seem to be too much of an overhead.

Community
  • 1
  • 1
Eric
  • 1
  • Out of curiosity, if you are that concerned with the speed, why are you using C#? C++ or C will likely get you significant speed increases. – riwalk Aug 02 '10 at 18:34
  • When you profile it, make sure Visual Studio is NOT attached. It might suddely become a gazillion times faster. (Or it might not, that really depends.) – Roman Starkov Aug 02 '10 at 18:34
  • 4
    to reply you need to click the little 'add comment' link below a persons anser – µBio Aug 02 '10 at 18:34
  • Which profiler are you using? I found that Ants profiler adds quite a big amount to the time taken compared to Dottrace. – Giorgi Aug 02 '10 at 18:37

4 Answers4

3

Likely to be as fast or faster than what you can do (without going into lower level code aka. C/C++).

// somewhere else
private static readonly char[] splitter =  new []{' '} ;

//
public string[] SplitOnMultiSpaces(string text)
{
    return text.Split(splitter, StringSplitOptions.RemoveEmptyEntries );
}
µBio
  • 10,668
  • 6
  • 38
  • 56
  • 1
    +1 for directly returning the result of Split and saving redundant memory allocations and copies. Suggestion: make the `new [] {' '}` a static array instead of 'newing' it every time. –  Aug 02 '10 at 18:46
2

Have compared performance using the String.Split overload that takes a StringSplitOptions that would make your empty string check unnecessary?

Nicole Calinoiu
  • 20,843
  • 2
  • 44
  • 49
1

You could just replace

var split = text.Split(' ');

with

var split = text.Split(' ', StringSplitOptions.RemoveEmptyEntries);

But this too should be profiled.

H H
  • 263,252
  • 30
  • 330
  • 514
0

When I benchmark this, in either debug or release mode, I get virtually identical runtimes regardless of whether "if (split[i].Length!=0)" is there or not, both corresponding to your quickest times. (Thus supporting the idea that Length is a quick check.) Is there something not shown that could be affecting the performance in some other way?

Having said that, I'd like to agree that StringSplitOptions.RemoveEmptyEntries is the best way to go. But I'm still curious as to why I can't reproduce the original behaviour.

TechNeilogy
  • 1,271
  • 7
  • 13