I have a 2 list of string. Is there a simple way to find if one list contains all the strings of the 2nd list?
(By saying simple, I mean that I don't have explicitly compare for each string in one list to all the strings
I have a 2 list of string. Is there a simple way to find if one list contains all the strings of the 2nd list?
(By saying simple, I mean that I don't have explicitly compare for each string in one list to all the strings
Use Enumerable.Except to find differences between lists. If there is no items in result, then all items from list2 are in list1:
bool containsAll = !list2.Except(list1).Any();
Internally Except
uses Set<T>
to get unique items from list1
and returns only that items from list2
which are not in set. If there is nothing to return, then all items in set.
Try this:
firstList.All(x=>secondList.Contains(x));
Shorter version (method group):
firstList.All(secondList.Contains)
You need to write using for Linq:
using System.Linq;
It ckeckes wheter All items from first list are in second list. Contains
checks if given item is in list. All
gives true if all items of collections are matching predicate. Given predicate is: if item is in second list, so whole expresion checks if all items are in second list <- proved working :)
For larger lists use a HashSet<T>
(which results in linear Big O, as opposed to O(n^2) when just using two lists):
var hash = new HashSet<string>(list2);
bool containsAll = list1.All(hash.Contains);
use LINQ
bool isSublistOf = list1.All(list2.Contains);
The All method returns true if the condition in the lambda is met for every element in the IEnumerable
. The All is passed the Contains method of List2 as the Func<bool,string>
which returns true if the element is found List2
. The net effect is that the statement returns true if ALL of the elements in List1 are found in List2.
Performance Note
Due to the nature of the All operator, it is worst case O(n^2), but will exit at the first chance (any mismatch). Using completely random 8 byte strings, I tried out each of the cases using a rudimentary performance harness.
static void Main(string[] args)
{
long count = 5000000;
//Get 5,000,000 random strings (the superset)
var strings = CreateRandomStrings(count);
//Get 1000 random strings (the subset)
var substrings = CreateRandomStrings(1000);
//Perform the hashing technique
var start = DateTime.Now;
var hash = new HashSet<string>(strings);
var mid = DateTime.Now;
var any = substrings.All(hash.Contains);
var end = DateTime.Now;
Console.WriteLine("Hashing took " + end.Subtract(start).TotalMilliseconds + " " + mid.Subtract(start).Milliseconds + " of which was setting up the hash");
//Do the scanning all technique
start = DateTime.Now;
any = substrings.All(strings.Contains);
end = DateTime.Now;
Console.WriteLine("Scanning took " + end.Subtract(start).TotalMilliseconds);
//Do the Excepting technique
start = DateTime.Now;
any = substrings.Except(strings).Any();
end = DateTime.Now;
Console.WriteLine("Excepting took " + end.Subtract(start).TotalMilliseconds);
Console.ReadKey();
}
private static string[] CreateRandomStrings(long count)
{
var rng = new Random(DateTime.Now.Millisecond);
string[] strings = new string[count];
byte[] bytes = new byte[8];
for (long i = 0; i < count; i++) {
rng.NextBytes(bytes);
strings[i] = Convert.ToBase64String(bytes);
}
return strings;
}
The result ranked them in the following order fairly consistently:
list1.All(list2.Contains)
)The excepting method takes much longer because it requires all the work up front. Unlike the other methods, it will not exit on a mismatch, but continue to process all elements. The Hashing is much faster, but also does significant work up front in setting up the hash. This would be the fastest method if the strings were less random and intersections were more certain.
Disclaimer
All performance tuning at this level is next to irrelevant. This is just a mental exercise only