5

I was interested in finding the fastest way to initialize an empty set in Python (I'm using 3.8). You cannot instantiate an empty set as {} as that creates a dict, so what is generally recommended is to use the set() constructor. I noticed the other day that there is another way to instantiate an empty set: you can unpack an empty tuple () into the {...} syntax for sets as follows: {*()}. Timing this with the timeit module in ipython gives the following results:

%timeit {*()}
67.7 ns ± 1.68 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit set()
84.5 ns ± 2.57 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

I find this pretty peculiar - the more elegant set() constructor takes 25% more time relative to {*()}. The same observations have been made in the past with, e.g., [] vs. list() and {} vs. dict().

%timeit []
17.8 ns ± 0.791 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
%timeit list()
81 ns ± 1.56 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit {}
18.6 ns ± 0.575 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
%timeit dict()
98.6 ns ± 5.09 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

It is rather easy to switch to the primitive syntax {} and [] for lists and dicts, but what I've found for sets is obviously not as clean. I'm curious as to insights into this (in general).

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
craymichael
  • 4,578
  • 1
  • 15
  • 24
  • 1
    This performance differences are so small that for all intents and purposes, just use what you find the most readable. But this is likely the result of function-call overhead, which is rather large for Python. – juanpa.arrivillaga Dec 19 '20 at 21:57
  • In general I completely agree. In certain algorithms, I can see nontrivial speedup - but this is more about my curiosity about the internals than anything else. – craymichael Dec 20 '20 at 02:13

4 Answers4

3
>>> import dis
>>> dis.dis("{*()}")
  1           0 LOAD_CONST               0 (())
              2 BUILD_SET_UNPACK         1
              4 RETURN_VALUE
>>> dis.dis("set()")
  1           0 LOAD_NAME                0 (set)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

Two things are faster, here.

First, set requires a global lookup, LOAD_NAME which is usually two hash-lookups (one in the global namespace and one in the builtins namespace). Then there is the function call, which is expensive in Python.

Compare this to the LOAD_CONST which is a quick, single, array indexing operation, and BUILD_SET_UNPACK which is a specialized, set-building instruction. No doubt it does what the set constructor does eventually, but basically here you end up using a byte-code level short-cut).

In any case, for all intents in purposes, stick with the more readable set().

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
2

I find that staring at the details that close tends to result in misleading conclusions.

If you actually do something with the result, set() is faster.

For example:

from timeit import timeit


def do_literal():
    s = {*()}
    s.add(1)


def do_function():
    s = set()
    s.add(1)


print(timeit(do_literal, number=100000000))
print(timeit(do_function, number=100000000))

Results:

14.3875497
13.313828099999998

Why the difference occurs is another interesting question, but perhaps not for SO (and I certainly don't know the answer).

Grismar
  • 27,561
  • 4
  • 31
  • 54
  • 1
    Very good point. Although, I suspect, if you want to do *a lot of little things* the performance will be about equal. – juanpa.arrivillaga Dec 19 '20 at 22:10
  • I suspect this has to do with the intial capacity of the resulting set, the set constructor is tuned for a capacity that will allow several additions efficiently, whereas the literal version is tuned for allowing the smallest excess capacity. Maybe try comparing `set()` vs `set(())`. I'd be interested to see the results. – juanpa.arrivillaga Dec 19 '20 at 22:11
  • A similar example, `[None]*n` will create a list where there is no overallocation of the inital buffer, so the first `.append` to it will be slow, because it will require a re-allocation of that buffer. However, `[None for _ in range(n)]` will likely result in a buffer that is over-allocated, so `.append` will be very fast. – juanpa.arrivillaga Dec 19 '20 at 22:13
  • This is a very interesting test! I am getting at what @juanpa.arrivillaga is saying - there are many cases where empty containers exist for the purpose of, e.g. valid comparisons, placeholders, etc., and may not ever be populated with items - so no pre-allocation is necessary – craymichael Dec 20 '20 at 02:16
1

set() is a function call that requires a lookup into the symbol table, while a set construction literal is an "artifact of the syntax". It's especially visible with a peer into the byte code.

See here for a more in-depth answer

wellinthatcase
  • 120
  • 2
  • 8
1

I am speculating that, similarly to the case of dict() vs {} and list vs [] it has a different structure that involves function calls. See the output of dis:

import dis
>>> dis.dis('set()')
  1           0 LOAD_NAME                0 (set)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('{*()}')
  1           0 LOAD_CONST               0 (())
              2 BUILD_SET_UNPACK         1
              4 RETURN_VALUE

Function calls in Python are expensive as they have high extra overhead.

Arn
  • 1,898
  • 12
  • 26