224

For the following block of code:

For I = 0 To listOfStrings.Count - 1
    If myString.Contains(lstOfStrings.Item(I)) Then
        Return True
    End If
Next
Return False

The output is:

Case 1:

myString: C:\Files\myfile.doc
listOfString: C:\Files\, C:\Files2\
Result: True

Case 2:

myString: C:\Files3\myfile.doc
listOfString: C:\Files\, C:\Files2\
Result: False

The list (listOfStrings) may contain several items (minimum 20) and it has to be checked against a thousands of strings (like myString).

Is there a better (more efficient) way to write this code?

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
user57175
  • 3,284
  • 9
  • 32
  • 26

14 Answers14

492

With LINQ, and using C# (I don't know VB much these days):

bool b = listOfStrings.Any(s=>myString.Contains(s));

or (shorter and more efficient, but arguably less clear):

bool b = listOfStrings.Any(myString.Contains);

If you were testing equality, it would be worth looking at HashSet etc, but this won't help with partial matches unless you split it into fragments and add an order of complexity.


update: if you really mean "StartsWith", then you could sort the list and place it into an array ; then use Array.BinarySearch to find each item - check by lookup to see if it is a full or partial match.

Update: in the recent .Net, Contains has optional StringComparison parameter , that can be used for case-insensitive comparison, e.g. myString.Contains(s,StringComparison.CurrentCultureIgnoreCase);

Michael Freidgeim
  • 26,542
  • 16
  • 152
  • 170
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • 1
    Instead of Contains I'd use StartsWith based on his examples. – tvanfosson Feb 01 '09 at 15:01
  • @tvanfosson - that depends on whether the examples are fully inclusive, but yes, I'd agree. Simple to change, of course. – Marc Gravell Feb 01 '09 at 15:05
  • In how far is this code more efficient on the algorithmic level? It's shorter and faster if the loops in "Any" are faster, but the problem that you have to perform exact matching many times is the same. – Torsten Marek Feb 01 '09 at 15:20
  • You could setup a custom comparator if you are using a set. – Fortyrunner Feb 01 '09 at 15:29
  • The second isn't really more efficient by any measurable difference in practice. – ICR Feb 01 '09 at 16:26
  • @ICR - there isn't a huge amount you can (easily) do for *contains* matching. If it is "starts with", I've added a comment re binary search - that *would* be faster. – Marc Gravell Feb 01 '09 at 20:36
  • Why the later one is more efficient than the lambda one? – ca9163d9 Sep 25 '12 at 23:02
  • @NickW do you mean the middle one? The first one creates a delegate to an anonymous method on a capture-context with a reference to `myString` which then invokes the `string.Contains` method on the `myString` field. The second one simply creates a delegate to the `string.Contains` method with `myString` as the `target` for the delegate - much more direct. – Marc Gravell Sep 26 '12 at 05:59
  • 1
    I find putting this in an extension method with `params` keyword for the array makes it much more readable. `myString.ContainsAny("this", "that", "those");`. – Connell May 29 '15 at 11:23
  • How do you figure out which value has matched ? When i passed S to get the value I get a build error. – kirushan Mar 07 '17 at 19:29
  • @MarcGravell What is the running time complexity? – Jumabek Alikhanov May 28 '17 at 02:09
14

when you construct yours strings it should be like this

bool inact = new string[] { "SUSPENDARE", "DIZOLVARE" }.Any(s=>stare.Contains(s));
Exhausted
  • 1,867
  • 2
  • 23
  • 33
Simi2525
  • 231
  • 3
  • 2
6

I liked Marc's answer, but needed the Contains matching to be CaSe InSenSiTiVe.

This was the solution:

bool b = listOfStrings.Any(s => myString.IndexOf(s, StringComparison.OrdinalIgnoreCase) >= 0))
WhoIsRich
  • 4,053
  • 1
  • 33
  • 19
5

There were a number of suggestions from an earlier similar question "Best way to test for existing string against a large list of comparables".

Regex might be sufficient for your requirement. The expression would be a concatenation of all the candidate substrings, with an OR "|" operator between them. Of course, you'll have to watch out for unescaped characters when building the expression, or a failure to compile it because of complexity or size limitations.

Another way to do this would be to construct a trie data structure to represent all the candidate substrings (this may somewhat duplicate what the regex matcher is doing). As you step through each character in the test string, you would create a new pointer to the root of the trie, and advance existing pointers to the appropriate child (if any). You get a match when any pointer reaches a leaf.

Community
  • 1
  • 1
Zach Scrivena
  • 29,073
  • 11
  • 63
  • 73
3

Old question. But since VB.NET was the original requirement. Using the same values of the accepted answer:

listOfStrings.Any(Function(s) myString.Contains(s))
Luis Lavieri
  • 4,064
  • 6
  • 39
  • 69
3

As I needed to check if there are items from a list in a (long) string, I ended up with this one:

listOfStrings.Any(x => myString.ToUpper().Contains(x.ToUpper()));

Or in vb.net:

listOfStrings.Any(Function(x) myString.ToUpper().Contains(x.ToUpper()))
2
myList.Any(myString.Contains);
WIRN
  • 915
  • 1
  • 16
  • 31
2

Based on your patterns one improvement would be to change to using StartsWith instead of Contains. StartsWith need only iterate through each string until it finds the first mismatch instead of having to restart the search at every character position when it finds one.

Also, based on your patterns, it looks like you may be able to extract the first part of the path for myString, then reverse the comparison -- looking for the starting path of myString in the list of strings rather than the other way around.

string[] pathComponents = myString.Split( Path.DirectorySeparatorChar );
string startPath = pathComponents[0] + Path.DirectorySeparatorChar;

return listOfStrings.Contains( startPath );

EDIT: This would be even faster using the HashSet idea @Marc Gravell mentions since you could change Contains to ContainsKey and the lookup would be O(1) instead of O(N). You would have to make sure that the paths match exactly. Note that this is not a general solution as is @Marc Gravell's but is tailored to your examples.

Sorry for the C# example. I haven't had enough coffee to translate to VB.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
1

Have you tested the speed?

i.e. Have you created a sample set of data and profiled it? It may not be as bad as you think.

This might also be something you could spawn off into a separate thread and give the illusion of speed!

Fortyrunner
  • 12,702
  • 4
  • 31
  • 54
1

The drawback of Contains method is that it doesn't allow to specify comparison type which is often important when comparing strings. It is always culture-sensitive and case-sensitive. So I think the answer of WhoIsRich is valuable, I just want to show a simpler alternative:

listOfStrings.Any(s => s.Equals(myString, StringComparison.OrdinalIgnoreCase))
Al Kepp
  • 5,831
  • 2
  • 28
  • 48
0

If speed is critical, you might want to look for the Aho-Corasick algorithm for sets of patterns.

It's a trie with failure links, that is, complexity is O(n+m+k), where n is the length of the input text, m the cumulative length of the patterns and k the number of matches. You just have to modify the algorithm to terminate after the first match is found.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Torsten Marek
  • 83,780
  • 21
  • 91
  • 98
0

Slight variation, I needed to find if there were whole words and case insensitive in a string.

myString.Split(' ', StringSplitOptions.RemoveEmptyEntries).Intersect(listOfStrings).Any())

for case insensitive myString and listOfStrings have been converted to uppercase.

XouDo
  • 945
  • 10
  • 19
0

Case -insensitive extension

 public static bool ContainsAnyOfKeys(this string text,List<string> keys)
 {
     bool b = keys.Any(s => text.Contains(s,StringComparison.CurrentCultureIgnoreCase));
     return b;
 }
Michael Freidgeim
  • 26,542
  • 16
  • 152
  • 170
-1

It's not a very big chunk of code, but this is a nice use case for an extension method as a way to provide reusability, readability, and documentation.

public static class StringExtensions
{
    /// <summary>
    /// Tests whether any of the entries in the search list can be found in the source string
    /// </summary>
    public static bool ContainsAny(this string source, IEnumerable<string> search)
    {
        return search.Any(source.Contains);
    }
}

public static class LinqExtensions
{
    /// <summary>
    /// Tests whether the any value in the source list matches any of the values in the search list
    /// </summary>
    public static bool ContainsAny<T>(this IEnumerable<T> source, IEnumerable<T> search)
    {
        return source.Any(search.Contains);
    }
}

Then you can consume like this for strings:

var containsMatch = "Hello World".ContainsAny(new[] { "World", "Earth" }); // true

Or like this for strings:

var list = new [] {"a","b"};
var containsMatch = list.ContainsAny(new [] {"b", "c"}); // true

Demo in CodePen

Similar Questions:

KyleMit
  • 30,350
  • 66
  • 462
  • 664