Why is dictionary so much faster than list?

Question

I am testing the speed of getting data from Dictionary VS list.
I've used this code to test :

    internal class Program
{
    private static void Main(string[] args)
    {
        var stopwatch = new Stopwatch();
        List<Grade> grades = Grade.GetData().ToList();
        List<Student> students = Student.GetStudents().ToList();

        stopwatch.Start();
        foreach (Student student in students)
        {
            student.Grade = grades.Single(x => x.StudentId == student.Id).Value;
        }
        stopwatch.Stop();
        Console.WriteLine("Using list {0}", stopwatch.Elapsed);
        stopwatch.Reset();
        students = Student.GetStudents().ToList();
        stopwatch.Start();
        Dictionary<Guid, string> dic = Grade.GetData().ToDictionary(x => x.StudentId, x => x.Value);
        foreach (Student student in students)
        {
            student.Grade = dic[student.Id];
        }
        stopwatch.Stop();
        Console.WriteLine("Using dictionary {0}", stopwatch.Elapsed);
        Console.ReadKey();
    }
}

public class GuidHelper
{
    public static List<Guid> ListOfIds=new List<Guid>();

    static GuidHelper()
    {
        for (int i = 0; i < 10000; i++)
        {
            ListOfIds.Add(Guid.NewGuid());
        }
    }
}


public class Grade
{
    public Guid StudentId { get; set; }
    public string Value { get; set; }

    public static IEnumerable<Grade> GetData()
    {
        for (int i = 0; i < 10000; i++)
        {
            yield return new Grade
                             {
                                 StudentId = GuidHelper.ListOfIds[i], Value = "Value " + i
                             };
        }
    }
}

public class Student
{
    public Guid Id { get; set; }
    public string Name { get; set; }
    public string Grade { get; set; }

    public static IEnumerable<Student> GetStudents()
    {
        for (int i = 0; i < 10000; i++)
        {
            yield return new Student
                             {
                                 Id = GuidHelper.ListOfIds[i],
                                 Name = "Name " + i
                             };
        }
    }
}

There is list of students and grades in memory they have StudentId in common.
In first way I tried to find Grade of a student using LINQ on a list that takes near 7 seconds on my machine and in another way first I converted List into dictionary then finding grades of student from dictionary using key that takes less than a second . enter image description here

Have you tried the other way around (test using dictionary first and then using the list later on)? — Fendy, Jun 07 '13 at 06:44
It's called a [Shlemiel the painter's algorithm](http://www.joelonsoftware.com/articles/fog0000000319.html). The answers explain why that is. — user, Jun 07 '13 at 07:47
Having some knowledge on simple data structures really helps programming so I suggest you to read some information on that. — Alvin Wong, Jun 07 '13 at 12:55
Do you have any idea how LINQ's `Single` method works and how Dictionary stores data? If not, google `C# List.Single` (look for Enumerable.Single on MSDN) and `wiki hash table` (Dictionary is a hash table under the hood). — Emperor Orionii, Jun 07 '13 at 14:29
A more reasonable comparison would be dictionary(key) vs list(index). — GameKyuubi, Nov 07 '16 at 16:40

score 144 · Accepted Answer · answered Jun 07 '13 at 06:38

144

When you do this:

student.Grade = grades.Single(x => x.StudentId == student.Id).Value;

As written it has to enumerate the entire List until it finds the entry in the List that has the correct studentId (does entry 0 match the lambda? No... Does entry 1 match the lambda? No... etc etc). This is O(n). Since you do it once for every student, it is O(n^2).

However when you do this:

student.Grade = dic[student.Id];

If you want to find a certain element by key in a dictionary, it can instantly jump to where it is in the dictionary - this is O(1). O(n) for doing it for every student. (If you want to know how this is done - Dictionary runs a mathematical operation on the key, which turns it into a value that is a place inside the dictionary, which is the same place it put it when it was inserted)

So, dictionary is faster because you used a better algorithm.

answered Jun 07 '13 at 06:38

Patashu

21,443
3
45
53

98

Also, because `Single` was used, it will keep iterating until the end to ensure that only one element exists (or throw an exception if one is found). – DaveShaw Jun 07 '13 at 08:49
http://pastebin.com/1B22JSPz <-- Dictionary uses FindKey(TKey key) to located the index of the entry. It splits all they keys in to a series of small buckets which can be searched much quicker than the entire dictionary. – Uatec Jun 07 '13 at 10:06
2

Is this a cryptic way to describe a balanced tree, or are those separate algorithms? Nut sure if I confuse things, but b-tree would mean O(log n) instead of O(1). – Zsolt Szilagyi Jun 07 '13 at 12:37
1

Finding a element in a Dictionary is not O(1), the average case is likely O(log n). – Tim B Jun 07 '13 at 12:53
1

@TimB http://en.wikipedia.org/wiki/Hash_table says it is O(1). The performance drop from having high load on your hash table is mitigated by making it larger. – Patashu Jun 07 '13 at 13:16
@Patashu Hmm, I never did do very well in algorithms class at university. I always thought a hashtable had similar search performance to a binary search tree. – Tim B Jun 07 '13 at 13:24
3

@TimB A tree is O(log(n)) because it is as deep as log(n) where n is the number of elements. In a hash table you look in the table once and there it is. Of course if it's congested you might have to check the second, or the third, but the thing is that how many 'misses' you get is not a function of n, it's a function of how much space you allocated for the hash table. – Patashu Jun 07 '13 at 13:42
9

just to provide the terminology, a hashtable is said to have *amortized cost* of O(1). Not exactly O(1). due to the reasons Patashu gave – im so confused Jun 07 '13 at 14:21
@DaveShaw Woah. you just blew my mind. I've always assumed that single() just called first() and checked length of returned collection. and so it was rather speedy. – Doug Chamberlain Aug 23 '16 at 18:31
@DougChamberlain you can see the source of `Single` [here](https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,226beadd4e521229) – Scott Chamberlain Mar 01 '17 at 23:10
1

And Single searches the entire list to match if there's ONLY ONE item that matches the entire list, and throws a exception if there's more than one item in the list. Eg. It doesn't stop if the matched element is the first one in the list. – jefissu Jun 11 '19 at 17:20

score 18 · Answer 2 · answered Jun 07 '13 at 06:43

The reason is because a dictionary is a lookup, while a list is an iteration.

Dictionary uses a hash lookup, while your list requires walking through the list until it finds the result from beginning to the result each time.

to put it another way. The list will be faster than the dictionary on the first item, because there's nothing to look up. it's the first item, boom.. it's done. but the second time the list has to look through the first item, then the second item. The third time through it has to look through the first item, then the second item, then the third item.. etc..

So each iteration the lookup takes more and more time. The larger the list, the longer it takes. While the dictionary is always a more or less fixed lookup time (it also increases as the dictionary gets larger, but at a much slower pace, so by comparison it's almost fixed).

score 11 · Answer 3 · answered Jun 07 '13 at 06:37

When using Dictionary you are using a key to retrieve your information, which enables it to find it more efficiently, with List you are using Single Linq expression, which since it is a list, has no other option other than to look in entire list for wanted the item.

score 9 · Answer 4 · answered Jun 07 '13 at 06:38

9

Dictionary uses hashing to search for the data. Each item in the dictionary is stored in buckets of items that contain the same hash. It's a lot quicker.

Try sorting your list, it will be a a bit quicker then.

answered Jun 07 '13 at 06:38

Quinton Bernhardt

4,773
19
28

You have to say something about how sorting the list will only help if you use something like [`List.BinarySearch()`](https://msdn.microsoft.com/en-us/library/ftfdbfx6%28v=vs.110%29.aspx) ;-). (`list[list.BinarySearch(new Grade { StudentId = student.Id, }, Comparer.Create((a, b) => a.StudentId.CompareTo(b.StudentId)))]`) – binki Nov 01 '16 at 14:25

score 9 · Answer 5 · answered Jun 07 '13 at 12:13

A dictionary uses a hash table, it is a great data structure as it maps an input to a corresponding output almost instantaneously, it has a complexity of O(1) as already pointed out which means more or less immediate retrieval.

The cons of it is that for the sake of performance you need lots of space in advance (depending on the implementation be it separate chaining or linear/quadratic probing you may need at least as much as you're planning to store, probably double in the latter case) and you need a good hashing algorithm that maps uniquely your input ("John Smith") to a corresponding output such as a position in an array (hash_array[34521]).

Also listing the entries in a sorted order is a problem. If I may quote Wikipedia:

Listing all n entries in some specific order generally requires a separate sorting step, whose cost is proportional to log(n) per entry.

Have a read on linear probing and separate chaining for some gorier details :)

score 3 · Answer 6 · answered Jun 07 '13 at 06:38

3

Dictionary is based on a hash table which is a rather efficient algorithm to look up things. In a list you have to go element by element in order to find something.

It's all a matter of data organization...

answered Jun 07 '13 at 06:38

JeffRSon

10,404
4
26
51

score 2 · Answer 7 · edited May 17 '20 at 15:41

2

When it comes to lookup of data, a keyed collection is always faster than a non-keyed collection. This is because a non-keyed collection will have to enumerate its elements to find what you are looking for. While in a keyed collection you can just access the element directly via the key.

These are some nice articles for comparing list to dictionary.

Here. And this one.

edited May 17 '20 at 15:41

Nimantha

6,405
6
28
69

answered Jun 07 '13 at 07:00

aiapatag

3,355
20
24

Arrays and lists are also "keyed collections". Index is a key from [0, n>. Dictionary keys simply have looser constrains on a domain. – Emperor Orionii Jun 08 '13 at 14:55
1

@emperor: so, if i know the index, then retrieving data from a list is ...exactly as fast as retrieving it from a dictionary (by key)? – mike Aug 11 '15 at 20:19
2

@mike In fact it's faster. To access 105th element of an array (lists use arrays internally) CPU simply has to add `(105 - 1) * element size` to an address of the first element to get it's location in memory. With dictionaries it has to calculate hash code of the key, make "bucket" index from it (`hash code % bucket count`) and check elements in the bucket one by one to find the one with the given key. How dictionary distributes elements to buckets affects performance but in average case it's good enough to minimize having too many items it the same one. – Emperor Orionii Aug 13 '15 at 07:26
@emperor: thanks for the clarification, that's what i would've expected; your previous comment confused me a bit ;o) – mike Aug 15 '15 at 08:08

howserss · Answer 8 · 2018-08-16T13:01:25.117

From MSDN - Dictionary mentions close to O(1) but I think it depends on the types involved.

The Dictionary(TKey,TValue) generic class provides a mapping from a set of keys to a set of values. Each addition to the dictionary consists of a value and its associated key. Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table.

Note: The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.

List(TValue) does not implement a hash lookup so it is sequential and the performance is O(n). It also depends on the types involved and boxing/unboxing needs to be considered.

Why is dictionary so much faster than list?

8 Answers8

Linked