13

So I got these examples from the official documentation. https://docs.python.org/2/library/timeit.html

What exactly makes the first example (generator expression) slower than the second (list comprehension)?

>>> timeit.timeit('"-".join(str(n) for n in range(100))', number=10000)
0.8187260627746582
>>> timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000)
0.7288308143615723
Kevin
  • 977
  • 1
  • 10
  • 17
  • See Ray Hettinger's answer here http://stackoverflow.com/questions/9060653/list-comprehension-without-python – Bhargav Rao Jun 13 '16 at 05:43
  • @BhargavRao -- While I agree that Raymond's answer does address this question, this question is fundamentally different than _that_ question. (Here, OP wants to know why list-comp is faster -- In the other question, OP didn't even know the difference between a generator or a list-comp...). I guess I'm not sure what the policy should be for bringing the dupe-hammer down in cases like these ... – mgilson Jun 13 '16 at 06:04
  • 1
    @mgilson The other question is a super set of this question. There was a post on meta that stated that we can vote to close as a duplicate of a broader question. Like for e.g, closing *how do I convert this to a list comprehension* as a duplicate of *what is a list comp, how does it work*. There are attempts at creating broader questions to help Ops (have a look at [cannon](http://sopython.com/canon/)). All in all, if a particular question has been answered somewhere else, then we close as dupe. (I'm against the description that says*exact dupe* for hammers instead of the normal has an answer) – Bhargav Rao Jun 13 '16 at 06:24
  • If need be, we can re-close it as a dupe of [this](http://stackoverflow.com/questions/35138661/my-list-comprehension-is-faster-than-equivalent-code-with-generator-expression) which again is the dupe of the original post. – Bhargav Rao Jun 13 '16 at 06:35

1 Answers1

27

The str.join method converts its iterable parameter to a list if it's not a list or tuple already. This lets the joining logic iterate over the items multiple times (it makes one pass to calculate the size of the result string, then a second pass to actually copy the data).

You can see this in the CPython source code:

PyObject *
PyUnicode_Join(PyObject *separator, PyObject *seq)
{
    /* lots of variable declarations at the start of the function omitted */

    fseq = PySequence_Fast(seq, "can only join an iterable");

    /* ... */
}

The PySequence_Fast function in the C API does just what I described. It converts an arbitrary iterable into a list (essentially by calling list on it), unless it already is a list or tuple.

The conversion of the generator expression to a list means that the usual benefits of generators (a smaller memory footprint and the potential for short-circuiting) don't apply to str.join, and so the (small) additional overhead that the generator has makes its performance worse.

Blckknght
  • 100,903
  • 11
  • 120
  • 169