0

When comparing usage of Python Generators vs List for better performance/ optimisation, i read that Generators are faster to create than list but iterating over list is faster than generator. But I coded an example to test it with small and big sample of data and it contradicts with one another.

When I test speed for iterating over generator and list using 1_000_000_000 where the actual generator will have 500,000,000 numbers. I see the result where Generator iteration is faster than list

from time import time

my_generator = (i for i in range(1_000_000_000) if i % 2 == 0)

start = time()
for i in my_generator:
    pass
print("Time for Generator iteration - ", time() - start)
my_list = [i for i in range(1_000_000_000) if i % 2 == 0]

start = time()
for i in my_list:
    pass
print("Time for List iteration - ", time() - start)

And the output is:

Time for Generator iteration -  67.49345350265503
Time for List iteration - 89.21837282180786

But if i use small chunk of data 10_000_000 instead of 1_000_000_000 in input, List iteration is faster than Generator.

from time import time

my_generator = (i for i in range(10_000_000) if i % 2 == 0)

start = time()
for i in my_generator:
    pass
print("Time for Generator iteration - ", time() - start)

my_list = [i for i in range(10_000_000) if i % 2 == 0]

start = time()
for i in my_list:
    pass
print("Time for list iteration - ", time() - start)

The output is:

Time for Generator iteration -  1.0233261585235596
Time for list iteration -  0.11701655387878418

Why is behaviour happening?

Vijeth Kashyap
  • 179
  • 1
  • 11
  • 1
    To have reliable results use the standard library `timeit` module – gimix Jul 23 '22 at 10:02
  • 3
    The timings on my machine were completely different. 67 the generator, 21 the list. They match my expectations because the iteration on the list is doing less operations than the generator counterpart. When you are iterating on the list it already has filtered all the values – Dani Mesejo Jul 23 '22 at 10:08
  • @DaniMesejo Is it the same code as my first snippet with 1_000_000_000 items for generator and list? – Vijeth Kashyap Jul 23 '22 at 10:15
  • 1
    Yes, copied and pasted. The iteration on the list is over 500_000_000 elements an no operations is done, while the generator is over 1_000_000_000 and each time you do a division (reminder) operation – Dani Mesejo Jul 23 '22 at 10:17
  • Right, generator iteration will be similar to function calls and it has got to be slower. But i am still getting more time for list iteration than generator – Vijeth Kashyap Jul 23 '22 at 10:19
  • @gimix `timeit` is giving me expected results where list iteration is now faster!! Is `time` module not reliable as `timeit` in Python? I have been using `time` module since long time – Vijeth Kashyap Jul 23 '22 at 10:22
  • 1
    `time` gives you the total time spent - so if something else happens meanwhile on your system (garbage collection, or perhaps another program doing its job) your results will be affected. `timeit` "avoids a number of common traps for measuring execution times", for instance it disables GC by default – gimix Jul 23 '22 at 10:34
  • Does this answer your question? [Speed to iterate multiple times over a generator compared to a list](https://stackoverflow.com/questions/40807277/speed-to-iterate-multiple-times-over-a-generator-compared-to-a-list) – Daniel Hao Jul 23 '22 at 11:04
  • 1
    @DanielHao I fell this is a differnt question of how generators are exhausted after single iteration and this doesn't relate to this question – Vijeth Kashyap Jul 23 '22 at 11:30
  • 1
    OK. just for reference then. – Daniel Hao Jul 23 '22 at 11:33

2 Answers2

1

After understanding points made by @gimix and @Dani Mesejo, I found the answer. Indeed list iteration is faster than generator iteration

In case of generator, a generator is called like a function call for each iteration we are also calling reminder operation (modulus)for each iteration as it makes it even slower for each call...Whereas in case of list it is calculated during creation itself and iteration is faster. Thus creation of list might be slower than creation of generator but iteration of list is definitely faster than list

The above code uses time module which is not reliable!! Now I used timeit for 1_000_000 and for 1_000_000_000 data and in both cases list iteration was faster :

import timeit

mysetup = '''my_generator = (i for i in range(10_000_000) if i % 2 == 0)
'''

mycode = '''
for i in my_generator:
    pass
'''

mysetup1 = '''my_list = [i for i in range(10_000_000) if i % 2 == 0]'''

mycode1 = '''
for i in my_list:
    pass
'''
print (timeit.timeit(setup = mysetup,
                    stmt = mycode,
                     number = 1))
print (timeit.timeit(setup = mysetup1,
                    stmt = mycode1,
                     number = 1))
Vijeth Kashyap
  • 179
  • 1
  • 11
0

for better understanding of what is the benefit of generators regarding efficiency. suppose that you want to read a file with 10M rows. first you read it with a regular method like below:

from time import time

first_ts = time()

def regular_file_reader(filename):
    file_ = open(filename, "r")
    data = file_.readlines()
    file_.close()
    return data

for row in regular_file_reader("sample_file.csv"):
    print(row)
    global second_time
    second_time = time()
    break
print(second_time - first_ts)

as you can see after reading first line of the file we break ed from loop, because that's what generators make difference "just reading first element". for iterating on next ones it may be even inefficient.

def generator_file_reader(filename):
    with open(filename, "r") as f:
        for row in f:
            yield row


for row in generator_file_reader("sample_file.csv"):
    print(row)
    global second_time
    second_time = time()
    break

print(second_time - first_ts)

in this case as generator just read first line not the whole file, using generator is way more faster.

amir jj
  • 226
  • 2
  • 18