This could be terribly trivial, but I'm having trouble finding an answer that executes in less than n^2 time. Let's say I have two string arrays and I want to know which strings exist in both arrays. How would I do that, efficiently, in VB.NET or is there a way to do this other than a double loop?
-
What version of .NET? in 3.5 you could use the linq extension for Intersect – Quintin Robinson Mar 23 '09 at 16:26
-
I believe we are using 2.0 at this point. Corporate hosting policy, unfortunately. – willasaywhat Mar 23 '09 at 17:07
4 Answers
The simple way (assuming no .NET 3.5) is to dump the strings from one array in a hashtable, and then loop through the other array checking against the hashtable. That should be much faster than an n^2 search.

- 9,999
- 2
- 44
- 54
-
It was probably best if they were in a hashset, dictionary, or list rather than an array in the first place. – Joel Coehoorn Mar 23 '09 at 16:41
If you sort both arrays, you can then walk through them each once to find all the matching strings.
Pseudo-code:
while(index1 < list1.Length && index2 < list2.Length)
{
if(list1[index1] == list2[index2])
{
// You've found a match
index1++;
index2++;
} else if(list1[index1] < list2[index2]) {
index1++;
} else {
index2++;
}
}
Then you've reduced it to the time it takes to do the sorting.

- 50,583
- 16
- 120
- 115
Sort both lists. Then you can know with certainty that if the next entry in list A is 'cobble' and the next entry in list B is 'definite', then 'cobble' is not in list B. Simply advance the pointer/counter on whichever list has the lower ranked result and ascend the rankings.
For example:
List 1: D,B,M,A,I
List 2: I,A,P,N,D,G
sorted:
List 1: A,B,D,I,M
List 2: A,D,G,I,N,P
A vs A --> match, store A, advance both
B vs D --> B
D vs D --> match, store D, advance both
I vs G --> I>G, advance 2
I vs I --> match, store I, advance both
M vs N --> M
List 1 has no more items, quit.
List of matches is A,D,I
2 list sorts O(n log(n)), plus O(n) comparisons makes this O(n(log(n) + 1)).

- 19,928
- 7
- 68
- 105
If one of the arrays is sorted you can do a binary search on it in the inner loop, this will decrease the time to O(n log n)

- 62,489
- 19
- 106
- 139