0

Lookup<string, string> vs List<(string, string)>? Which is faster to loop through with a foreach?

Is it slower to use List of tuples, if you don't need to search for a particular list of key-value pairs (keys could be duplicated)? Does Lookup have any other benefits except its methods?

EDIT:

Really smart comment mentioned this and I should add it to the battle:

Lookup<string, string> vs List<(string, string)> vs Dictionary<string, List<string>>

Nikolai
  • 555
  • 7
  • 17
  • 4
    They're really very different. A `Lookup` is more like a `Dictionary>` than a `List<(string, string)>`. – Jon Skeet Jul 03 '20 at 15:46
  • 4
    [Race your horses](https://ericlippert.com/2012/12/17/performance-rant/). For looking up the value(s) for a key, lookups tend to be faster with more elements, linear searches tend to be faster with fewer elements. "Faster", "more", and "fewer" all depend on the details of your situation, which is why you need to profile. – canton7 Jul 03 '20 at 15:47
  • Could you give an example of input data, and how you're creating the Lookup and List? – canton7 Jul 03 '20 at 15:52
  • And then how you're using it. We have no idea what you're trying to do, which makes it hard to say anything concrete. – Jon Skeet Jul 03 '20 at 15:57
  • @canton7 I have between 5 and 50 IList collections with no more than 50-100 elements. I loop through all of these collections with 2 nested foreach { foreach {} } And I add every element from them to a Dictionary. But I figured it out that there are duplicated keys. After I add every object to the collection (list of tuples for now), I have to foreach this collection to loop through its elements (250-5000 elements). – Nikolai Jul 03 '20 at 15:59
  • 3
    You haven't mentioned anything about using the key *as a key*, in which case there's no point in going through the work of putting something in a collection designed for looking up keys effectively. – Jon Skeet Jul 03 '20 at 16:00
  • Any chance you could put together some code which creates (programatically) some sample inputs, and does the logic you're doing? I want to put together some benchmarks for you, but I want to make sure that they're actually representative of exactly what you're doing. This of course includes the code to look up the elements for a key, which I'm assuming you're doing, but which you haven't talked about. – canton7 Jul 03 '20 at 16:00
  • The Lookup<> has a key and it must be unique. you could check for the existence first. – T McKeown Jul 03 '20 at 16:02
  • @canton7 I appreciate your contribution to my question. You all made me think more about the problem and I think that List of tuples suits me best. It's not something really important, but the discussion about these collections was really good. – Nikolai Jul 03 '20 at 16:04
  • You still think Lookups<> can have duplicate keys? – T McKeown Jul 03 '20 at 16:05
  • @TMcKeown I meant having a key for multiple values. I didn't phrase the sentence well. Sorry – Nikolai Jul 03 '20 at 16:06
  • 1
    You still haven't explained whether you're ever actually performing any lookups on this key, or whether *fundamentally* you only want a sequence of items. – Jon Skeet Jul 03 '20 at 17:02

1 Answers1

0

Which is faster to loop through with a foreach?

To iterate over, a list will be orders of magnitude faster. Just take a gander at the Lookup.GetEnumerator() function and how much work it's doing because it doesn't store the original elements in a list:

   public IEnumerator<IGrouping<TKey, TElement>> GetEnumerator() {
        Grouping g = lastGrouping;
        if (g != null) {
            do {
                g = g.next;
                yield return g;
            } while (g != lastGrouping);
        }
    }

I actually sat down and wrote a benchmark for it:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.17763.1282 (1809/October2018Update/Redstone5)
Intel Core i7-8850H CPU 2.60GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.4.20258.7
  [Host]     : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT  [AttachedDebugger]
  DefaultJob : .NET Core 3.1.4 (CoreCLR 4.700.20.20201, CoreFX 4.700.20.22101), X64 RyuJIT


|     Method |      n |           Mean |        Error |       StdDev |    Gen 0 | Gen 1 | Gen 2 | Allocated |
|----------- |------- |---------------:|-------------:|-------------:|---------:|------:|------:|----------:|
| LookupTest |    100 |     2,733.6 ns |     25.08 ns |     22.23 ns |   0.8583 |     - |     - |    4048 B |
|   ListTest |    100 |       263.6 ns |      1.54 ns |      1.20 ns |        - |     - |     - |         - |
| LookupTest |   1000 |    27,185.6 ns |    245.01 ns |    217.20 ns |   8.4839 |     - |     - |   40048 B |
|   ListTest |   1000 |     2,306.6 ns |     22.53 ns |     19.97 ns |        - |     - |     - |         - |
| LookupTest |  10000 |   274,364.4 ns |  5,041.13 ns |  6,554.89 ns |  84.9609 |     - |     - |  400048 B |
|   ListTest |  10000 |    22,903.7 ns |    178.62 ns |    158.34 ns |        - |     - |     - |         - |
| LookupTest | 100000 | 2,774,880.1 ns | 49,165.10 ns | 58,527.55 ns | 847.6563 |     - |     - | 4000008 B |
|   ListTest | 100000 |   231,208.2 ns |  4,449.43 ns |  4,369.93 ns |        - |     - |     - |         - |
Blindy
  • 65,249
  • 10
  • 91
  • 131
  • I wouldn't be quite so sure, not without profiling. The JIT can do some surprisingly clever things. – canton7 Jul 03 '20 at 15:51
  • You can be as unsure as you wish, any programmer can tell you how much faster iterating over a contiguous block of memory is versus a linked list. – Blindy Jul 03 '20 at 15:52
  • Right, but if there are a relatively small number of groups, and a relatively large number of elements per group, that cost starts to disappear. There might still be a difference, but it probably won't be orders of magnitude. Also objects allocated sequentially will often be in contiguous memory. Granted the object header means that it's not quite as contiguous, but that's fairly small. – canton7 Jul 03 '20 at 15:56
  • 2
    I'd expect both to be sufficiently quick that the only time iterating over a list would be "orders of magnitude faster" is if you're not actually doing anything with each item - in which case you can make it infinitely faster by not doing it at all. – Jon Skeet Jul 03 '20 at 15:58
  • In a tight loop it really adds up (especially cache misses), though even if it didn't, OP specifically asked which is faster, not if it matters relative to the amount of work done on each item. – Blindy Jul 03 '20 at 16:00
  • If the OP is ignoring the cost of iteration compared with the cost of what they're doing, that's a far more important problem they should address IMO. I think the hyperbole of "orders of magnitude faster" really doesn't help the answer here. (It doesn't help that the question you're answering is too vague at the moment IMO.) – Jon Skeet Jul 03 '20 at 16:03
  • The disagreement I have is with the term "orders of magnitude": that means something specific, and I'm fairly sure (although not certain, without having profiled it) that it's incorrect here. – canton7 Jul 03 '20 at 16:03
  • 1
    @JonSkeet You two actually made me second guess myself so I wrote a quick benchmark for it and the results corroborated my gut instinct: over an order of magnitude difference. You should never use linked lists if you have a choice! – Blindy Jul 03 '20 at 16:49
  • Also note the allocations, neither test allocates the actual object tested inside of it, the allocations for the look up are all the nested enumerators. It's inefficient on every kind of measure. – Blindy Jul 03 '20 at 16:50
  • 2
    I don't think I'd describe "just over one order of magnitude" as "orders of magnitude" myself - nor would I say linked lists are always the wrong choice. If you want to be able to delete or insert an element while you navigate, they can be very useful. But I think what's far more important is that both are *really* quick to iterate over, and if you're doing any significant work with each item, that's likely to make the iteration cost irrelevant. I think that's a far more valuable and pragmatic lesson than "You should never use linked lists if you have a choice" myself. – Jon Skeet Jul 03 '20 at 17:01