9

I'm studying string searching algorithms now and wondering what algorithm is used for .NET String.Contains function for example. Reflector shows that this function is used but I have no idea what its name means.

private static extern int InternalFindNLSStringEx(IntPtr handle, string localeName, int flags, string source, int sourceCount, int startIndex, string target, int targetCount);
Hun1Ahpu
  • 3,315
  • 3
  • 28
  • 34
  • possible duplicate of [How does String.Contains work?](http://stackoverflow.com/questions/3769519/how-does-string-contains-work) – Hans Passant Feb 13 '13 at 15:25

3 Answers3

11

It’s just the naïve string search implementation via a nested loop over the text and the pattern, with O(n·m) runtime.

In particular, the MSDN doesn’t specify the performance of this method so it’s not safe to assume a better performance.

Furthermore, most advanced pattern searching methods are quite specialised for certain string types and while there are better general-purpose search algorithms, implementing one in String.IndexOf is somewhat of an unnecessary optimisation.

The reason is simple: if you require an efficient pattern search then you’ll implement your own anyway, custom-tailored to fit your particular data. So there’s no need to implement something fancy in the general-purpose library.


As of 2016 (with the Core CLR source code now available), the implementation is still using a naïve nested loop. This is implemented in NewApis::IndexOfString and NewApis::FastIndexOfString, which are called (via InternalFindNLSStringEx) from the managed String.Contains and String.IndexOf functions.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Thank you for reasonable answer. – Hun1Ahpu Apr 06 '10 at 14:17
  • 2
    I could not agree less; why would you implement your own general string search if there is already one in the library? Furthermore, string searching algorithms are exceptions since there is Boyer-Moore which is so nice (all pro no con) that is deemed standard benchmark. What I mean to say is that usually choosing an algorithm for a library can be really hard, but with strings I would not expect anything less to be used in any library worth anything. Please do correct me if I am wrong. – Unreason Apr 06 '10 at 17:05
  • 1
    @Unreason: in most applications it simply doesn’t matter. Using `Find` for anything but trivial tasks results in complicated logic and code bloat anyway. Better use regular expressions or a parser. Where you *do* need to use pattern finding and where it matters, you rarely deal with Unicode. Often it’s highly specialized like DNA sequences. And *of course* you don’t program that yourself – you use ready-made libraries. But these are specialized and don’t belong in a general framework like the .NET framework. – Konrad Rudolph Apr 06 '10 at 18:07
  • 1
    @Unreason: and Boyer-Moore depends critically on the size of the alphabet which is forbidding for Unicode. It’s possible to modify the algorithm to use hashing instead but already the parameters are getting murky. I think few people who work on a general framework like .NET know how to get this right (more efficient than the naive search in the general case and 100% bug-free) without making the implementation expensive. – Konrad Rudolph Apr 06 '10 at 18:11
  • 1
    (cont’d) Finally, Boyer-Moore is by no means the last word; Horspool is often (most of the time?) more efficient (not to mention easier to implement). I’m sure there are modern variants which tweak Horspool even more. And, without having performed any benchmarks, I’ll wager that for most applications (i.e. *relatively* small Unicode strings) naive search outperforms Horspool hands down. – Konrad Rudolph Apr 06 '10 at 18:17
  • + for comment - makes sense :) For completeness sake here's another discussion that coincides with your reasoning http://yedda.com/questions/String_IndexOf_search_algorithm_net_1499176185184/ – Unreason Apr 06 '10 at 18:56
0

Why using reflector when the source code is available?

Muds
  • 4,006
  • 5
  • 31
  • 53
Stéphane
  • 11,755
  • 7
  • 49
  • 63
  • 9
    That still wont show you any native/internal code, as this method does. – leppie Apr 06 '10 at 10:46
  • I was trying to use the source code but it doesn't show me the algorithm. Am I doing something wrong or there is no way to see the real implementation? – Hun1Ahpu Apr 06 '10 at 11:34
  • 1
    Also you should put thin in a comment, not the answare – Marek Jun 16 '16 at 11:20
0

I couldn't find that method stub, but my best guess is that it local-specific case-folding for case-insensitive searches.

leppie
  • 115,091
  • 17
  • 196
  • 297