2

I have a relatively long (20,000 rows) CSV file and a simple function I wrote to open it:

def read_prices():
    with open('sp500.csv', 'r') as f:
        reader = csv.DictReader(f)
        for row in reader:
            yield float(row['Adj Close'].strip())

when I time it as it is it takes 3e-05s:

print(timeit.timeit(lambda: read_prices(), number=100))

when I time the same function but with tuple(...) it takes a whopping 27s:

print(timeit.timeit(lambda: tuple(read_prices()), number=100))

Is this normal for tuple()? Why might this be? I'm a beginner so ELI5 explanations welcome:)

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
ilmoi
  • 1,994
  • 2
  • 21
  • 45

2 Answers2

4

That happens because read_prices is not a function - it is actually a generator. That is because of the yield keyword.

As explaind in the functional programming HOWTO:

Any function containing a yield keyword is a generator function; this is detected by Python’s bytecode compiler which compiles the function specially as a result.

When you call a generator function, it doesn’t return a single value; instead it returns a generator object that supports the iterator protocol.

So what happens when you run the first read_prices() is just a creation of a generator object, waiting to be told to yield elements.

In the second version, tuple(read_prices()), you create the generator object as before, but the tuple() actually exhausts it and yields ALL elements at once.


A simple demonstration:

>>> def yielder():
...     yield from [1, 2, 3]
...     
>>> y = yielder()
>>> y
<generator object yielder at 0x2b5604090de0>
>>> next(y)
1
>>> list(y)
[2, 3]
>>> tuple(yielder())
(1, 2, 3)
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
2

This is because this is a generator read_prices('SP500.csv') which is pretty much doing nothing when called like this.

However when you do this tuple(read_prices('SP500.csv')) it actions the generator and provides the values.

A generator is a iterable is actioned by a:

  • for loop
  • next
  • unpacking using tuple (as you noted) or list

Among other operations involving collection constructs.

Here is a more concrete example of a generator:

def f():
    print("First value:")
    yield "first"
    print("Second value:")
    yield "second"

Here it is in action:

### Nothing prints when called (analogous to your first timeit  without tuple)

In [2]: v = f()

In [3]:

### However when I call `next` the first value is provided:

In [3]: next(v)
First value:
Out[3]: 'first'

## etc, until there is no more values and a "StopIteration` exception is raised:

In [4]: next(v)
Second value:
Out[4]: 'second'

In [5]: next(v)
------------------------------------
...

StopIteration:

## by unpacking using "tuple" the "StopIteration" 
## exception is handled and all the values are provided at once
##  (like your timeit using the tuple):

In [6]: tuple(f())
First value:
Second value:
Out[6]: ('first', 'second')
salparadise
  • 5,699
  • 1
  • 26
  • 32