-1

I have two lists of strings, and I need to check to see if there are any matches, and I have to do this at a minimum of sixty times a second, but this can scale up to thousands of times a second.

Right now, the lists are both small; one is three, and another might have a few dozen elements at most, but the currently small list is probably gonna grow.

Would it be faster to do this:

for (int i = 0; i < listA.Length; i++)
{
    for (int j = 0; j < listB.Length; j++) {
        if (listA[i] == listB[j])
        {
            // do stuff
        }
    }
}

Or to do this:

var hashSetB = new HashSet<string>(listB.Length);
for (int i = 0; i < listB.Length; i++)
{
    hashSetB.Add(listB[i]);
}

for (int i = 0; i < listA.Length; i++)
{
    if (hashSetB.Contains(listA[i])) {
        // do stuff
    }
}

ListA and ListB when they come to me, will always be lists; I have no control over them.

I think the core of my question is that I don't know how long var hashSetB = new HashSet<string>(listB.Length); takes, so I'm not sure the change would be good or bad for smaller lists.

d0m1n1c
  • 157
  • 4
  • 16
  • 6
    See [Which is faster?](https://ericlippert.com/2012/12/17/performance-rant/) – madreflection Jun 06 '22 at 16:58
  • 2
    For these sorts of questions, you should just benchmark the code yourself. But I will say you're comparing linear complexity (hashing) with exponential (nested loops), so for a large number of elements, I'd expect the hashing version to be faster. – Kirk Woll Jun 06 '22 at 16:58
  • 1
    If lists don't have duplicates, have a look at `Intersect`: `foreach (var item in listA.Intersect(listB)) {...}` here we loop over all common `item`s in both lists – Dmitry Bychenko Jun 06 '22 at 16:59
  • `var hashSetB = new HashSet(listB);` for shorter code: it creates `hashSetB` and fills it with `listB`'s items – Dmitry Bychenko Jun 06 '22 at 17:01
  • How long is the list? How long does it take to populate the list? Is the list cached? Do you need it to be thread-safe? Lots of questions, almost all of which need you to do your own testing. – DavidG Jun 06 '22 at 17:08

1 Answers1

1

Was curious so here's some code I wrote to test it. From what I got back, HashSet was near instantaneous whereas nested loops were slow. Makes sense as you've essentially taken something where you needed to do lengthA * lengthB operations and simplified it to lengthA + lengthB operation.

        const int size = 20000;

        var listA = new List<int>();
        for (int i = 0; i < size; i++)
        {
            listA.Add(i);
        }

        var listB = new List<int>();
        for (int i = size - 5; i < 2 * size; i++)
        {
            listB.Add(i);
        }

        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < listA.Count; i++)
        {
            for (int j = 0; j < listB.Count; j++)
            {
                if (listA[i] == listB[j])
                {
                    Console.WriteLine("Nested loop match");
                }
            }
        }

        long timeTaken1 = sw.ElapsedMilliseconds;

        sw.Restart();

        var hashSetB = new HashSet<int>(listB.Count);
        for (int i = 0; i < listB.Count; i++)
        {
            hashSetB.Add(listB[i]);
        }

        for (int i = 0; i < listA.Count; i++)
        {
            if (hashSetB.Contains(listA[i]))
            {
                Console.WriteLine("HashSet match");
            }
        }

        long timeTaken2 = sw.ElapsedMilliseconds;

        Console.WriteLine("Time Taken Nested Loop: " + timeTaken1);
        Console.WriteLine("Time Taken HashSet: " + timeTaken2);
        Console.ReadLine();
Will Moffat
  • 169
  • 8
  • 3
    This is not a valid performance test. You can't just use a Stopwatch and expect valid, repeatable results. There are other things to consider. How many elements in the lists? How long are the strings in the lists? Are the lists ordered? Did you run this in release mode rather than debug mode? Did you have any warm up code? There is a reason tools like Benchmark.NET exist. – DavidG Jun 06 '22 at 17:12
  • Admittedly, I threw this together in 2 minutes as I was curious about something. I played around with different sizes and switched the order up to see if it had any significant impact. If you wanted to be more scientific I guess you could put it in a for loop and run it 1000 times. The main point if differentiation I'd say is that I used ints instead of strings as they're easier to generate quickly. – Will Moffat Jun 06 '22 at 17:15
  • I didn't even notice the `int` vs `string` comparison, that makes this 12.56 times worse. – DavidG Jun 06 '22 at 17:16