3

are there techniques for introspecting generator objects (e.g. for assertions in unit tests)?

more specifically, i have a data processing pipeline comprised of a sequence of small functions applied to values often inside list comprehensions or generator expressions, like so:

generate some random data:

>>> raw_data = ["${}".format(RND.randint(10, 100)) for c in range(10)]

>>> # a function that does some sort of of transform
>>> fnx = lambda q: float(q.replace('$', ''))

>>> d1 = [fnx(itm) for itm in raw_data]

in a next step, another transform function will be applied over the items of d1, and so on.

in the case just above, assertions for instance, on the length of prices_clean, or on the min/max of its values, etc, are the heart of my unit test suite:

>>> assert len(d1) == 10

given that i am just going to iterate through these intermediate results, i don't actually need a list, a generator object will do, and given the much lower memory profile, that's what i use:

>>> d1 = (fnx(itm) for itm in raw_data)

of course the assertions i rely on when using list comprehensions are not available for generator objects:

>>> d1
  <generator object <genexpr> at 0x106da9230>

>>> assert len(d1) == 10
  Traceback (most recent call last):
  File "<pyshell#33>", line 1, in <module>
    assert len(d1) == 10
  TypeError: object of type 'generator' has no len()

if i have to call list() on the generator object just for an assert then my test suite runs very slowly (with the unfortunate practical result that debs often don't run it at all).

i have looked at the attributes of generator objects for any that i can usefully introspect, but i didn't see how i can use often them in the way i have described here.

doug
  • 69,080
  • 24
  • 165
  • 199
  • Generator objects are actually functions and don't know in advance how many results they're going to return. There's no way to get the "length" of a generator other than actually consuming it. – georg Oct 05 '13 at 09:39
  • If you want to examine the sequence generated then simply do `the_sequence = list(the_generator)` and then do all the asserts on `the_sequence`. This avoids calling `list` for every *single* assert(since you can assert both length, and contents in a single run). – Bakuriu Oct 05 '13 at 09:47

2 Answers2

5

Reference for Type checking of a Generator object

import types
self.assertIsInstance(_result_generator, types.GeneratorType)
Abhijeet
  • 8,561
  • 5
  • 70
  • 76
2

As @thg435 commented, without consuming it, you don't know the length of the generator.

Usually I do one of following:

In case the generator produce small number of elements:

assert len(list(d1)) == 10

or

assert sum(1 for x in d1) == 10
falsetru
  • 357,413
  • 63
  • 732
  • 636