Python Sets vs Lists

Question

In Python, which data structure is more efficient/speedy? Assuming that order is not important to me and I would be checking for duplicates anyway, is a Python set slower than a Python list?

I have an immediate follow-up that might be worth answering in this question: if sets are indeed optimized for search (`x in s`), what about construction + search ? It's relevant for instance in this case: which is faster: `x in [a, b, c]` or `x in {a, b, c}`? I.e. which should be our default go-to when constructing a set just to do a one-time search operation on it (as in an if clause)? — LoneCodeRanger, Dec 08 '22 at 10:32

score 301 · Accepted Answer · edited Aug 25 '22 at 19:34

301

It depends on what you are intending to do with it.

Sets are significantly faster when it comes to determining if an object is present in the set (as in x in s), but its elements are not ordered so you cannot access items by index as you would in a list. Sets are also somewhat slower to iterate over in practice.

You can use the timeit module to see which is faster for your situation.

edited Aug 25 '22 at 19:34

danijar

32,406
45
166
297

answered May 14 '10 at 01:04

Michael Aaron Safyan

93,612
16
138
200

10

For your point: "Sets are significantly faster ", what is the underlying implementation that makes it faster? – overexchange Jul 13 '15 at 13:50
Scripting languages like to hide the underlying implementations, but this apparent simplicity is not always a good thing, you do need some 'data structure' awareness when you design a piece of software. – Christophe Roussy Aug 02 '16 at 14:40
7

Set is not significantly slower than list while iterating. – omerfarukdogan Jan 23 '18 at 09:47
85

Sets and lists both have linear time iteration. To say that one is "slower" than the other is misguided and has confused new programmers who read this answer. – habnabit Mar 16 '18 at 01:34
@habnabit if you are saying that they both have linear time iteration. Does this mean they have the same iteration time? What is the difference then? – Mohammed Noureldin May 05 '20 at 14:37
11

They both have a running [time complexity](https://en.wikipedia.org/wiki/Time_complexity) of O(n) when iterated, but the [average-case complexity](https://en.wikipedia.org/wiki/Average-case_complexity) of iterating sets is [~28%](https://stackoverflow.com/a/17945009/3936044) greater (slower) than iterating lists – Mandera Dec 27 '20 at 08:25
6

To answer 'For your point: "Sets are significantly faster ", what is the underlying implementation that makes it faster?' Sets use hash functions to determine if an element is in it (if the hash function is good, i.e. collisions are not common, O(1) complexity), while to determine if an element is in a list, an iteration trough the list is necessary (O(n) complexity). [Check this out for the time complexity for the methods on Python default data structures](https://wiki.python.org/moin/TimeComplexity). – Hemerson Tacon Feb 02 '22 at 12:01

Ellis Percival · Answer 2 · 2018-10-26T11:23:24.207

Lists are slightly faster than sets when you just want to iterate over the values.

Sets, however, are significantly faster than lists if you want to check if an item is contained within it. They can only contain unique items though.

It turns out tuples perform in almost exactly the same way as lists, except for their immutability.

Iterating

>>> def iter_test(iterable):
...     for i in iterable:
...         pass
...
>>> from timeit import timeit
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = set(range(10000))",
...     number=100000)
12.666952133178711
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = list(range(10000))",
...     number=100000)
9.917098999023438
>>> timeit(
...     "iter_test(iterable)",
...     setup="from __main__ import iter_test; iterable = tuple(range(10000))",
...     number=100000)
9.865639209747314

Determine if an object is present

>>> def in_test(iterable):
...     for i in range(1000):
...         if i in iterable:
...             pass
...
>>> from timeit import timeit
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = set(range(1000))",
...     number=10000)
0.5591847896575928
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = list(range(1000))",
...     number=10000)
50.18339991569519
>>> timeit(
...     "in_test(iterable)",
...     setup="from __main__ import in_test; iterable = tuple(range(1000))",
...     number=10000)
51.597304821014404

I have found that (Initializing set -> 5.5300979614257812) (Initializing list -> 1.8846848011016846) (Initializing tuple -> 1.8730108737945557) Items of size 10,000 on my intel core i5 quad core with 12GB RAM. This should be take into consideration also. — ThePracticalOne, Sep 29 '14 at 17:39
I've updated the code to remove the object creation now. The setup phase of the timeit loops is only called once (https://docs.python.org/2/library/timeit.html#timeit.Timer.timeit). — Ellis Percival, Sep 30 '14 at 10:09

Christophe Roussy · Answer 3 · 2020-08-13T08:22:50.240

24

Set wins due to near instant 'contains' checks: https://en.wikipedia.org/wiki/Hash_table

List implementation: usually an array, low level close to the metal good for iteration and random access by element index.

Set implementation: https://en.wikipedia.org/wiki/Hash_table, it does not iterate on a list, but finds the element by computing a hash from the key, so it depends on the nature of the key elements and the hash function. Similar to what is used for dict. I suspect list could be faster if you have very few elements (< 5), the larger element count the better the set will perform for a contains check. It is also fast for element addition and removal. Also always keep in mind that building a set has a cost !

NOTE: If the list is already sorted, searching the list could be quite fast on small lists, but with more data a set is faster for contains checks.

edited Aug 13 '20 at 08:22

answered Aug 02 '16 at 14:35

Christophe Roussy

16,299
4
85
85

12

Close to the metal? What does that even mean in the context of Python? How is a list closer to the metal than a set? – roganjosh Jun 21 '18 at 18:32
2

@roganjosh, python still runs on a machine and some implementations like list as 'array' are closer to what the hardware is good at: https://stackoverflow.com/questions/176011/python-list-vs-array-when-to-use, but it always depends on what you want to achieve, it is good to know a bit about the implementations, not just the abstractions. – Christophe Roussy May 14 '19 at 16:46
3

"If the `list` is already sorted, searching the `list` could be quite fast on small lists, but with more data a `set` is faster for contains checks." To avoid confusion, you should probably make it clear that sorting only helps if you take advantage of the sorted order with something like the `bisect` module; a plain `in` check on a `list` is `O(n)` regardless of whether or not it's sorted, while `in` checks on `set` are `O(1)`. The `bisect` module can get the test down to `O(log n)` on a pre-sorted `list`, but it's more complicated to use than a simple `in` check. – ShadowRanger Sep 15 '21 at 05:11
@roganjosh A list maps more directly to instruction sets provided by modern CPUs. Ultimately all Python code executes some machine instructions . A hash algorithm is coded to be fast, but it is more instructions than a list traversal on a modern CPUs. When people consider **big-O**, they forget a constant factor. If 'n' is very small, then the constant can also dominate. However, I think it is wrong to say a list is an **array**, it is probably a linked list (in the 2nd sentence). – artless noise Jun 03 '22 at 14:58
Wouldn't make sense to have a special set which would only take integers, so there would be no need for hashes? Would that kind of set be faster? – PythoNic Jun 19 '22 at 12:30

score 10 · Answer 4 · edited Nov 14 '22 at 14:26

10

List performance:

>>> import timeit
>>> timeit.timeit(stmt='10**6 in a', setup='a = list(range(10**6))', number=1000)
15.08

Set performance:

>>> timeit.timeit(stmt='10**6 in a', setup='a = set(range(10**6))', number=1000)
3.90e-05

You may want to consider Tuples as they're similar to lists but can’t be modified. They take up slightly less memory and are faster to access. They aren’t as flexible but are more efficient than lists. Their normal use is to serve as dictionary keys.

Sets are also sequence structures but with two differences from lists and tuples. Although sets do have an order, that order is arbitrary and not under the programmer’s control. The second difference is that the elements in a set must be unique.

set by definition. [python | wiki].

>>> x = set([1, 1, 2, 2, 3, 3])
>>> x
{1, 2, 3}

edited Nov 14 '22 at 14:26

Alex

18,484
8
60
80

answered Aug 25 '13 at 22:43

user2601995

6,463
8
37
41

5

First off, you should update to the `set` built-in type link (http://docs.python.org/2/library/stdtypes.html#set) not the deprecated `sets` library. Second, "Sets are also sequence structures", read the following from the built-in type link: "Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior." – Seaux Feb 05 '14 at 02:25
13

`range` is not `list`. `range` is a special class with custom `__contains__` magic method. – Ryne Wang Apr 01 '18 at 07:40
1

@RyneWang this is true, but only for Python3. In Python2 range returns a normal list (that's why exists horrible things like `xrange`) – Manoel Vilela Dec 11 '18 at 17:55
This comparison is wrong. Just replicated and the first `timeit` is clearly done using python 3 with the `range` object `__contains__` method as mentioned by @RyneWang. Updated with apt comparison. – Alex Nov 14 '22 at 14:24

score 9 · Answer 5 · answered Jul 11 '19 at 06:18

tl;dr

Data structures (DS) are important because they are used to perform operations on data which basically implies: take some input, process it, and give back the output.

Some data structures are more useful than others in some particular cases. Therefore, it is quite unfair to ask which (DS) is more efficient/speedy. It is like asking which tool is more efficient between a knife and fork. I mean all depends on the situation.

Lists

A list is mutable sequence, typically used to store collections of homogeneous items.

Sets

A set object is an unordered collection of distinct hashable objects. It is commonly used to test membership, remove duplicates from a sequence, and compute mathematical operations such as intersection, union, difference, and symmetric difference.

Usage

From some of the answers, it is clear that a list is quite faster than a set when iterating over the values. On the other hand, a set is faster than a list when checking if an item is contained within it. Therefore, the only thing you can say is that a list is better than a set for some particular operations and vice-versa.

Pedro Gimeno · Answer 6 · 2019-10-25T22:06:03.473

I was interested in the results when checking, with CPython, if a value is one of a small number of literals. set wins in Python 3 vs tuple, list and or:

from timeit import timeit

def in_test1():
  for i in range(1000):
    if i in (314, 628):
      pass

def in_test2():
  for i in range(1000):
    if i in [314, 628]:
      pass

def in_test3():
  for i in range(1000):
    if i in {314, 628}:
      pass

def in_test4():
  for i in range(1000):
    if i == 314 or i == 628:
      pass

print("tuple")
print(timeit("in_test1()", setup="from __main__ import in_test1", number=100000))
print("list")
print(timeit("in_test2()", setup="from __main__ import in_test2", number=100000))
print("set")
print(timeit("in_test3()", setup="from __main__ import in_test3", number=100000))
print("or")
print(timeit("in_test4()", setup="from __main__ import in_test4", number=100000))

Output:

tuple
4.735646052286029
list
4.7308746771886945
set
3.5755991376936436
or
4.687681658193469

For 3 to 5 literals, set still wins by a wide margin, and or becomes the slowest.

In Python 2, set is always the slowest. or is the fastest for 2 to 3 literals, and tuple and list are faster with 4 or more literals. I couldn't distinguish the speed of tuple vs list.

When the values to test were cached in a global variable out of the function, rather than creating the literal within the loop, set won every time, even in Python 2.

These results apply to 64-bit CPython on a Core i7.

Your test is depending on implementation details here (and being messed with by them). By the natural rules of the language, the `list` and `set` cases would need to be rebuilt on every test (which would destroy their performance), and on older Python (definitely 2.x, not sure if older 3.x omitted the optimization) it does in fact rebuild the `set` literal on every pass, making it slower (Python 3 caches it as a constant `frozenset` to avoid the work). On both versions, your `list` test is actually being optimized to a `tuple` constant, so it's identical to the `tuple` case. — ShadowRanger, Sep 15 '21 at 05:14
@ShadowRanger Of course it depends on implementation details; that's the point of a benchmark, to check the performance of an implementation. This was a practical test to help deciding how to write these kinds of comparisons with CPython, which I've often run into. — Pedro Gimeno, Sep 15 '21 at 23:31

score 3 · Answer 7 · answered May 22 '20 at 07:24

Sets are faster, morover you get more functions with sets, such as lets say you have two sets :

set1 = {"Harry Potter", "James Bond", "Iron Man"}
set2 = {"Captain America", "Black Widow", "Hulk", "Harry Potter", "James Bond"}

We can easily join two sets:

set3 = set1.union(set2)

Find out what is common in both:

set3 = set1.intersection(set2)

Find out what is different in both:

set3 = set1.difference(set2)

And much more! Just try them out, they are fun! Moreover if you have to work on the different values within 2 list or common values within 2 lists, I prefer to convert your lists to sets, and many programmers do in that way. Hope it helps you :-)

score 1 · Answer 8 · answered May 07 '18 at 08:35

1

I would recommend a Set implementation where the use case is limit to referencing or search for existence and Tuple implementation where the use case requires you to perform iteration. A list is a low-level implementation and requires significant memory overhead.

answered May 07 '18 at 08:35

1

Indeed, the proper distinction between when to use Sets and when to use Tuple is indeed of utmost importance. I wouldn't be worried about the involved memory overheads, footprints unless I am scripting a lower-level API. – Aug 11 '19 at 22:31

score 1 · Answer 9 · answered May 31 '21 at 08:55

In the same vein as @Ellis Percival's tests, I'd like to add that lists perform in a similar way to sets when it comes to adding an element.

Adding an element

>>> def add_test_set(iterable):
...     for i in range(10000):
...         iterable.add(i)
...
>>> def add_test_list(iterable):
...     for i in range(10000):
...         iterable.append(i)
...
>>> timeit("add_test_set(iterable)",
...     setup="from __main__ import add_test_set; iterable = set()",
...     number=10000)
7.073143866999999
>>> timeit("add_test_list(iterable)",
...     setup="from __main__ import add_test_list; iterable = list()",
...     number=10000)
6.80650725000001

(I would have edited his post to include this but the edit queue was full)

score 0 · Answer 10 · answered Sep 19 '19 at 02:33

from datetime import datetime
listA = range(10000000)
setA = set(listA)
tupA = tuple(listA)
#Source Code

def calc(data, type):
start = datetime.now()
if data in type:
print ""
end = datetime.now()
print end-start

calc(9999, listA)
calc(9999, tupA)
calc(9999, setA)

Output after comparing 10 iterations for all 3 : Comparison

score 0 · Answer 11 · answered Jun 13 '23 at 15:01

Sets seem very fast to find if a value is present in it or not when compared to Lists.

A = list(range(1, 1000000))
A.pop(84559)

def find_in_set(A):
    maxA = max(A)
    for i in range(1,maxA+1):
         if i not in set(A):
            return i

This below find+in+list function takes almost ages to find missing 84559 from list 'A'. However the above function find_in_set takes just couple of seconds.

def find_in_list(A):
    maxA = max(A)
    for i in range(1,maxA+1):
         if i not in A:
            return i

Here you can find clear difference to the Sets vs. Lists

Python Sets vs Lists

11 Answers11

tl;dr

Lists

Sets

Usage

Adding an element

Linked

Related