0

I have a code like below:

for v1, v2 in zip(iter1, iter2):
   print len(v1) # prints 0

But when I change zip to itertools.izip, it prints 1

for v1, v2 in izip(iter1, iter2):
   print len(v1) # prints 1

Every other code is the same. I just replace zip with izip and it worked. The output of izip is the correct one.

Edit: Adding entire code:

#!/bin/python

"""
How to use:
>>> from color_assign import Bag, assign_colors
>>> from pprint import pprint
>>> old_topics = set([
... Bag(name='T1', group=0, color=1, count=16000),
... Bag(name='T2', group=0, color=1, count=16000),
... Bag(name='T3', group=1, color=2, count=16000),
... Bag(name='T4', group=2, color=3, count=16000),
... ])
>>> new_topics = set([
... Bag(name='T1', group=0, color=None, count=16000),
... Bag(name='T2', group=4, color=None, count=16000),
... Bag(name='T3', group=1, color=None, count=16000),
... Bag(name='T4', group=1, color=None, count=16000),
... ])
>>> color_ranges = [ [1,10] ]
>>> assign_colors(old_topics, new_topics, color_ranges)
>>> pprint(sorted(new_topics, key=attrgetter('name')))
[Bag(name=T1, group=0, color=1, count=16000),
 Bag(name=T2, group=4, color=3, count=16000),
 Bag(name=T3, group=1, color=2, count=16000),
 Bag(name=T4, group=1, color=2, count=16000)]
>>> 
"""

from itertools import groupby, izip
from operator import attrgetter

class Bag:
  def __init__(self, name, group, color=None, count=None):
    self.name  = name 
    self.group = group
    self.color    = color   
    self.count  = count 
  def __repr__(self):
    return "Bag(name={self.name}, group={self.group}, color={self.color}, count={self.count})".format(self=self)
  def __key(self):
    return self.name
  def __hash__(self):
    return hash(self.__key())
  def __eq__(self, other):
    return type(self) is type(other) and self.__key() == other.__key()

def color_range_gen(color_ranges, used_colors):
  color_ranges = sorted(color_ranges)
  color_iter = iter(sorted(used_colors))
  next_used = next(color_iter, None)
  for start_color, end_color in color_ranges:
    cur_color = start_color
    end_color = end_color
    while cur_color <= end_color:
      if cur_color == next_used:
        next_used = next(color_iter, None)
      else:
        yield cur_color
      cur_color = cur_color + 1


def assign_colors(old_topics, new_topics, color_ranges):
  old_topics -= (old_topics-new_topics) #Remove topics from old_topics which are no longer present in new_topics
  used_colors = set()

  def group_topics(topics):
    by_group = attrgetter('group')
    for _, tgrp in groupby(sorted(topics, key=by_group), by_group):
      yield tgrp

  for topic_group in group_topics(old_topics):
    oldtset = frozenset(topic_group)
    peek = next(iter(oldtset))
    try:
      new_group = next(topic.group for topic in new_topics if topic.name == peek.name and not topic.color)
    except StopIteration:
      continue
    newtset = frozenset(topic for topic in new_topics if topic.group == new_group)
    if oldtset <= newtset:
      for topic in newtset:
        topic.color = peek.color
      used_colors.add(peek.color)

  free_colors = color_range_gen(color_ranges, used_colors)
  unassigned_topics = (t for t in new_topics if not t.color)
  for tset, color in zip(group_topics(unassigned_topics), free_colors):
    for topic in tset:
      topic.color = color

if __name__ == '__main__':
  import doctest
  doctest.testmod()

Usage:

my_host:my_dir$ /tmp/color_assign.py
**********************************************************************
File "/tmp/color_assign.py", line 21, in __main__
Failed example:
    pprint(sorted(new_topics, key=attrgetter('name')))
Expected:
    [Bag(name=T1, group=0, color=1, count=16000),
     Bag(name=T2, group=4, color=3, count=16000),
     Bag(name=T3, group=1, color=2, count=16000),
     Bag(name=T4, group=1, color=2, count=16000)]
Got:
    [Bag(name=T1, group=0, color=None, count=16000),
     Bag(name=T2, group=4, color=3, count=16000),
     Bag(name=T3, group=1, color=2, count=16000),
     Bag(name=T4, group=1, color=2, count=16000)]
**********************************************************************
1 items had failures:
   1 of   7 in __main__
***Test Failed*** 1 failures.
my_host:my_dir$ sed -i 's/zip(/izip(/g' /tmp/color_assign.py
my_host:my_dir$ /tmp/color_assign.py
my_host:my_dir$

Update: the issue is with groupby invalidating the iterators when using zip

balki
  • 26,394
  • 30
  • 105
  • 151
  • 5
    Your bug is somewhere else. Show us [runnable code that demonstrates the error when you run it](http://stackoverflow.com/help/mcve). In particular, include the creation of `iter1` and `iter2`, and don't call `zip` in the code that's supposed to be using `izip`. – user2357112 Jan 31 '14 at 22:21
  • @user2357112 pasted a runnable code with testcase. The testcase would pass if you change zip to izip – balki Jan 31 '14 at 22:39
  • related: http://stackoverflow.com/q/8994319/198633 – inspectorG4dget Jan 31 '14 at 22:43
  • Can anyone spot the bug? Hope nothing is wrong with my computer – balki Jan 31 '14 at 22:44

2 Answers2

6

Yes, their output is same. The only difference is that zip creates a list in memory while izip returns an iterator.

>>> from itertools import izip

>>> zip(range(5), 'abcde')
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]

>>> it = izip(range(5), 'abcde')
>>> it
<itertools.izip object at 0xa660fcc>
>>> next(it)
(0, 'a')
>>> next(it)
(1, 'b')

Note that izip has been removed in Python3, and zip returns an iterator there.

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
3

The problem you're experiencing is due to a combination of two factors. First, izip only advances the underlying iterators as needed, while zip needs to fetch all items immediately. Second, when a groupby object is advanced, the previous iterators are no longer valid:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:

As a simple fix, you can change group_topics to call list on its groups before yielding them.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • I couldn't figure out how `for loop` affects the generator. I am only changing the current item in the for loop. Not refering to the loop's iteration at all. – balki Jan 31 '14 at 22:52
  • @balki: Analyzed your code further. See expanded answer. – user2357112 Jan 31 '14 at 22:59