115

The python wiki says: "Membership testing with sets and dictionaries is much faster, O(1), than searching sequences, O(n). When testing "a in b", b should be a set or dictionary instead of a list or tuple."

I've been using sets in place of lists whenever speed is important in my code, but lately I've been wondering why sets are so much faster than lists. Could anyone explain, or point me to a source that would explain, what exactly is going on behind the scenes in python to make sets faster?

codeforester
  • 39,467
  • 16
  • 112
  • 140

8 Answers8

239

list: Imagine you are looking for your socks in your closet, but you don't know in which drawer your socks are, so you have to search drawer by drawer until you find them (or maybe you never do). That's what we call O(n), because in the worst scenario, you will look in all your drawers (where n is the number of drawers).

set: Now, imagine you're still looking for your socks in your closet, but now you know in which drawer your socks are, say in the 3rd drawer. So, you will just search in the 3rd drawer, instead of searching in all drawers. That's what we call O(1), because in the worst scenario you will look in just one drawer.

Dominic Rodger
  • 97,747
  • 36
  • 197
  • 212
juliomalegria
  • 24,229
  • 14
  • 73
  • 89
  • 15
    Using real time examples are the best way to understand or teach anything. Well done! – Workonphp Aug 06 '15 at 12:39
  • @juliomalegria Do the sock drawers represent elements in the `list` / `set` or memory locations for the elements? – Alex Jan 18 '17 at 17:54
  • 1
    So is the reason that we "know the socks are in the 3rd drawer" that the value to lookup gets hashed and we can go directly to the corresponding memory location to look for the data? – Alex Jan 19 '17 at 07:56
  • This is a perfect ELI5 for the problem! – Alan Kavanagh Aug 28 '17 at 12:30
  • 2
    What a vivid, concise example to illustrate big-O and the value of hashsets! Kudos. – kghastie Sep 07 '17 at 14:32
  • Half of the answer is incorrect. Set lookups are not O(1) in the worst case. – omerfarukdogan Jan 23 '18 at 09:44
  • @AlexG The socks represent the element and the sock drawer represents the location. – Krithick S Sep 05 '21 at 17:27
  • This assumes that there are roughly the same number of items in each drawer. If every other drawer is empty, it doesn't help to know that the socks are in the 3rd drawer. That's why it's important to spread items "evenly" throughout the hash table that backs the set. – chepner Sep 26 '21 at 14:44
  • @chepner good to understand, but very unusual to have that many collisions. – Kye Mar 22 '22 at 05:50
  • @KyeRussell Only when you have a good hash function (which is what *makes* it a good hash function). – chepner Mar 22 '22 at 12:10
  • note that saying that `set` lookups are `O(1)` is misleading: it ignores the complexity of computing the hash (which is polynomial in the size of the element, which itself is `\Omega(log n)` on average) – tbrugere Jun 09 '22 at 07:26
173

Sets are implemented using hash tables. Whenever you add an object to a set, the position within the memory of the set object is determined using the hash of the object to be added. When testing for membership, all that needs to be done is basically to look if the object is at the position determined by its hash, so the speed of this operation does not depend on the size of the set. For lists, in contrast, the whole list needs to be searched, which will become slower as the list grows.

This is also the reason that sets do not preserve the order of the objects you add.

Note that sets aren't faster than lists in general -- membership test is faster for sets, and so is removing an element. As long as you don't need these operations, lists are often faster.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
16

I think you need to take a good look at a book on data structures. Basically, Python lists are implemented as dynamic arrays and sets are implemented as a hash tables.

The implementation of these data structures gives them radically different characteristics. For instance, a hash table has a very fast lookup time but cannot preserve the order of insertion.

André Caron
  • 44,541
  • 12
  • 67
  • 125
5

While I have not measured anything performance related in python so far, I'd still like to point out that lists are often faster.

Yes, you have O(1) vs. O(n). But always remember that this gives information only about the asymptotic behavior of something. That means if your n is very high O(1) will always be faster - theoretically. In practice however n often needs to be much bigger than your usual data set will be.

So sets are not faster than lists per se, but only if you have to handle a lot of elements.

Mene
  • 3,739
  • 21
  • 40
4

Python uses hashtables, which have O(1) lookup.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
2

Basically, Depends on the operation you are doing …

*For adding an element - then a set doesn’t need to move any data, and all it needs to do is calculate a hash value and add it to a table. For a list insertion then potentially there is data to be moved.

*For deleting an element - all a set needs to do is remove the hash entry from the hash table, for a list it potentially needs to move data around (on average 1/2 of the data.

*For a search (i.e. an in operator) - a set just needs to calculate the hash value of the data item, find that hash value in the hash table, and if it is there - then bingo. For a list, the search has to look up each item in turn - on average 1/2 of all of the terms in the list. Even for many 1000s of items a set will be far quicker to search.

Harshal SG
  • 403
  • 3
  • 7
  • *For adding element to set, it needs to check for equality, I guess that degrades the performance when compared to list. – rishu1992 May 10 '21 at 06:27
1

Actually sets are not faster than lists in every scenario. Generally the lists are faster than sets. But in the case of searching for an element in a collection, sets are faster because sets have been implemented using hash tables. So basically Python does not have to search the full set, which means that the time complexity in average is O(1). Lists use dynamic arrays and Python needs to check the full array to search. So it takes O(n).

So finally we can see that sets are better in some case and lists are better in some cases. Its up to us to select the appropriate data structure according to our task.

Kye
  • 4,279
  • 3
  • 21
  • 49
0

A list must be searched one by one, where a set or dictionary has an index for faster searching.

IanH
  • 3,968
  • 2
  • 23
  • 26