3

I was making an answer for this question, and when I tested the timing for my solution I came up with a contradiction to what I thought was correct.

The guy who made the question wanted to find a way to know how many different lists were contained within another list. (for more information, you can check the question)

My answer was basically this function:

def how_many_different_lists(lists):
    s = set(str(list_) for list_ in lists)
    return len(s)

Now, the situation came when I measured the time it takes to run and I compared it against basically the same function, but passing a list instead of a generator as a parameter to set():

def the_other_function(lists):
    s = set([str(list_) for list_ in lists])
    return len(s)

This is the decorator I use for testing functions:

import time

def timer(func):
    def func_decorated(*args):
        start_time = time.clock()
        result = func(*args)   
        print(time.clock() - start_time, "seconds")
        return result
    return func_decorated

And this were the results for the given input:

>>> list1 = [[1,2,3],[1,2,3],[1,2,2],[1,2,2]]
>>> how_many_different_lists(list1)
6.916326725558974e-05 seconds
2
>>> the_other_function(list1)
3.882067261429256e-05 seconds
2

Even for larger lists:

# (52 elements)
>>> list2= [[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2],[1,2,3],[1,2,3],[1,2,2],[1,2,2]]
>>> how_many_different_lists(list2)
0.00023560132331112982 seconds
2
>>> the_other_function(list2)
0.00021329059177332965 seconds
2

Now, my question is: Why is the second example faster than the first one? Aren't generators supposed to be faster due to the fact that the produce the elements "on demand"? I used to think that making a list and iterating through it was slower.

PS: I have tested many many times getting basically the same results.

revliscano
  • 2,227
  • 2
  • 12
  • 21
  • 1
    Calling `the_other_function` first appears to make that one run "faster" so I'd imagine there are other python optimisations at play - [example](https://repl.it/repls/WellinformedBluevioletRuntimeerror), Call them both a second time and the time difference is negligible – Sayse Jan 30 '20 at 22:15
  • Why are you rolling your own timing function instead of using `timeit`? – Barmar Jan 30 '20 at 22:20
  • Oh, yes, sorry about that. I have been playing around with `timeit` but I'm still not that familiar with it. Do you think it could make a huge difference here? (A big fan here of your answers, btw ;-) ) – revliscano Jan 30 '20 at 22:24
  • @Sayse Okay, so we could say that passing a list or a generator as a parameter are virtually the same thing as regards to performance? – revliscano Jan 30 '20 at 22:33
  • 1
    Bit of a side note, but why passing any list/generator comprehension to `set()`, when set has its own? `{str(list_) for list_ in lists}` ;) – Ondrej K. Jan 30 '20 at 22:49
  • @OndrejK. Oh, how embarrassing haha. Yes, you're right, thanks. But I still wonder why passing a list performs (even slightly) better than passing a generator... Maybe someday I'll find the answer! – revliscano Jan 30 '20 at 23:04
  • 1
    Not immediatelly sure about the exact mechanics, but having disassembled it, byte code using generator expression is one instruction longer. – Ondrej K. Jan 31 '20 at 01:40

1 Answers1

1

I have been benchmarking your functions:

enter image description here

from simple_benchmark import BenchmarkBuilder
from random import choice

b = BenchmarkBuilder()
from operator import setitem


@b.add_function()
def how_many_different_lists(lists):
    s = set(str(list_) for list_ in lists)
    return len(s)


@b.add_function()
def the_other_function(lists):
    s = set([str(list_) for list_ in lists])
    return len(s)


@b.add_arguments('Number of lists in the list')
def argument_provider():
    for exp in range(2, 18):
        size = 2**exp

        yield size,  [list(range(choice(range(100)))) for _ in range(size)]


r = b.run()
r.plot()

Generators are lazy because generator expression will create the items on the fly in comparison with list comprehension which will create the entire list in memory. You can read more here: Generator Expressions vs. List Comprehension

As you can see from the benchmark there is not such a big difference between them.

kederrac
  • 16,819
  • 6
  • 32
  • 55
  • 1
    Awesome analysis. So basically the answer is that there's no such thing as one being faster than the other one in practice, right? They perform almost the same overall? – revliscano Feb 23 '20 at 04:54