8

I don't quite understand how iterators have memory in Python.

>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [2, 3, 4, 5, 6, 7]
>>> iz = izip(l1, l2)

We still require O(min(l1, l2)) memory as we need to load the lists l1 and l2 in memory.

I thought one of the main uses of iterators was to save memory - yet it does not seem to be useful here.

Similarly, the code below is unclear to me:

>>> l1 = ( n for n in [1, 2, 3, 4, 5, 6] )
>>> l2 = ( n for n in [2, 3, 4, 5, 6, 7] )
>>> iz = izip(l1, l2)

We need to load the lists before converting them into generators, right? This means we'll waste memory. So - what is the point of generators here as well.

This is the only case that makes sense to me:

def build_l1():
    for n in xrange(1, 6):
       yield n

def build_l2:
    for n in xrange(2, 7):
       yield n

l1 = build_l1()
l2 = build_l2()
iz = izip(l1, l2)

None of the arrays is being loaded into memory. Hence we're in O(1) memory.

How does the memory usage of the iterator functions in Python work? The first two cases seem to use O(min(l1, l2)) memory. I thought the main point of iterators was to save memory, which makes the first two cases seem useless.

  • 7
    If you iterate over a list, it doesn't save memory. The point is, often you can avoid creating that list in the first place. Also, it doesn't only make sense to save memory when you can save it asymptotically. – L3viathan Feb 01 '16 at 13:26
  • 2
    Your `build_l1` and `build_l2` don't make much sense, `xrange` already stores just `(from, to, step)` – Kos Feb 01 '16 at 13:28

2 Answers2

14

Your examples are too simplistic. Consider this:

nums = [1, 2, 3, 4, 5, 6]
nums_it = (n for n in nums)

nums_it is a generator that returns all items unmodified from nums. Clearly you do not have any advantage. But consider this:

squares_it = (n ** 2 for n in nums)

and compare it with:

squares_lst = [n ** 2 for n in nums]

With squares_it, we are generating the squares of nums on the fly only when requested. With squares_lst, we are generating all of them at once and storing them in a new list.

So, when you do:

for n in squares_it:
    print(n)

it's like if you were doing:

for n in nums:
    print(n ** 2)

But when you do:

for n in squares_lst:
    print(n)

it's like if you were doing:

squares_lst = []
for n in nums:
    squares_lst.append(n ** 2)
for n in squares_lst:
    print(n)

If you don't need (or don't have) the list nums, then you can save even more space by using:

squares_it = (n ** 2 for n in xrange(1, 7))

Generators and iterators also provide another significant advantage (which may actually be a disadvantage, depending on the situation): they are evaluated lazily.

Also, generators and iterators may yield an infinite number of elements. An example is itertools.count() that yields 0, 1, 2, 3, ... without never ending.

Andrea Corbellini
  • 17,339
  • 3
  • 53
  • 69
  • Thanks for the reply. The generator version of `squares_lst` is `O(n)` in memory though. I do agree that the constant factor is minimized though. Is there a way to achieve `O(1)` memory using iterators and a pre-defined list? –  Feb 01 '16 at 14:05
  • @zero: `squares_it` is `O(1)` memory. Or are you considering the memory for `nums` too? – Andrea Corbellini Feb 01 '16 at 14:10
  • The entire algorithm to square those numbers would be `O(n)`, if I am correct? –  Feb 01 '16 at 14:25
  • @zero: yes. But change it to `(n ** 2 for n in xrange(1, 7))` and then you have a `O(1)` space algorithm – Andrea Corbellini Feb 01 '16 at 14:27
  • So - we cannot achieve `O(1)` without using an explicit generator like `xrange` or `build_l1`/`build_l2`? –  Feb 01 '16 at 14:38
  • @zero: once you have built a list, the list is there and takes its space. You can't turn a list into an iterator and pretend that the solution is `O(1)`. So yes: you must use only generators like `xrange` – Andrea Corbellini Feb 01 '16 at 14:42
  • 1
    @zero: When considering space complexity, you *only* consider memory *in addition to* the input. – chepner Feb 01 '16 at 15:00
  • @chepner said an important thing. If `nums` or `l1` is your input, then you can't save space it took, because you have no control over it. – Andrea Corbellini Feb 01 '16 at 15:03
0
>>> l1 = [1, 2, 3, 4, 5, 6]
>>> l2 = [2, 3, 4, 5, 6, 7]
>>> iz = izip(l1, l2)

We still require O(min(l1, l2)) memory as we need to load the lists l1 and l2 in memory.

With zip you need storage for the two original lists plus the zipped list. With izip you don't store the zipped list.

Big O notation isn't particularly helpful here if you have to work with a real physical machine instead of some abstract concept of a machine. There's a hidden constant multiplier on your O(n) calculations that could influence the practicality of the code well before n tends to infinity.

>>> l1 = ( n for n in [1, 2, 3, 4, 5, 6] )
>>> l2 = ( n for n in [2, 3, 4, 5, 6, 7] )
>>> iz = izip(l1, l2)

We need to load the lists before converting them into generators, right? This means we'll waste memory. So - what is the point of generators here as well.

No point to generators here. Any time you see n for n in <expr> without either a more complex expression before the for or an if <expr> filter after it, that's a code smell as you could just have used the original sequence directly. The generators only become useful when you transform the input values into something else or filter them.

Duncan
  • 92,073
  • 11
  • 122
  • 156
  • I should note that I'm practising for coding interviews and am tryingt o get solutions with O(1) memory using iterators. –  Feb 01 '16 at 14:02
  • A generator can also filter items, so `n for n in sequence if predicate(n)` may be useful without modifying any individual input value. – chepner Feb 01 '16 at 14:56