8

I'm trying to set up a "processing pipeline" for data that I'm reading in from a data source, and applying a sequence of operators (using generators) to each item as it is read.

Some sample code that demonstrates the same issue.

def reader():
    yield 1
    yield 2
    yield 3

def add_1(val):
    return val + 1

def add_5(val):
    return val + 5

def add_10(val):
    return val + 10

operators = [add_1, add_5, add_10]

def main():
    vals = reader()

    for op in operators:
        vals = (op(val) for val in vals)

    return vals

print(list(main()))

Desired : [17, 18, 19]
Actual: [31, 32, 33]

Python seems to not be saving the value of op each time through the for loop, so it instead applies the third function each time. Is there a way to "bind" the actual operator function to the generator expression each time through the for loop?

I could get around this trivially by changing the generator expression in the for loop to a list comprehension, but since the actual data is much larger, I don't want to be storing it all in memory at any one point.

gtback
  • 95
  • 1
  • 9
  • Thanks, everyone! The `map` solution worked best for me, since there are other things I want to do in the for loop as well (related to logging, additional checks, etc.). In my real program, each `operator` is actually a class with `__call__`, and has some other functions and attributes I need to deal with. The `reduce` solution would also work well, but loses the ability to do that without wrapping each operator in a function to perform those extra actions. – gtback Jan 25 '16 at 20:57

4 Answers4

2

You can define a little helper which composes the functions but in reverse order:

import functools

def compose(*fns):
    return functools.reduce(lambda f, g: lambda x: g(f(x)), fns)

I.e. you can use compose(f,g,h) to generate a lambda expression equivalent to lambda x: h(g(f(x))). This order is uncommon, but ensures that your functions are applied left-to-right (which is probably what you expect):

Using this, your main becomes just

def main():
    vals = reader()
    f = compose(add_1, add_5, add_10)
    return (f(v) for v in vals)
Frerich Raabe
  • 90,689
  • 19
  • 115
  • 207
  • Good point on the order, I didn't think about that. I've fixed my answer. – texasflood Jan 25 '16 at 14:55
  • Note that this changes the order of operations: I.e., OP's code first calculates `op_1` for all the values, then `op_5`, etc., while yours first applies all the operations to `val_1`, then to `val_2`, etc. Depending on the application, this might be perfectly okay or a problem. (Just wanted to point out) – tobias_k Jan 25 '16 at 15:04
2

You can force a variable to be bound by creating the generator in a new function. eg.

def map_operator(operator, iterable):
    # closure value of operator is now separate for each generator created
    return (operator(item) for item in iterable)

def main():
    vals = reader()
    for op in operators:
        vals = map_operator(op, vals)   
    return vals

However, map_operator is pretty much identical to the map builtin (in python 3.x). So just use that instead.

Dunes
  • 37,291
  • 7
  • 81
  • 97
  • 1
    In Python 2, make sure to use [`itertools.imap`](https://docs.python.org/2.7/library/itertools.html#itertools.imap). I learned this the hard way. – gtback Jan 26 '16 at 17:23
1

This may be what you want - create a composite function:

import functools

def compose(functions):
    return functools.reduce(lambda f, g: lambda x: g(f(x)), functions, lambda x: x)

def reader():
    yield 1
    yield 2
    yield 3

def add_1(val):
    return val + 1

def add_5(val):
    return val + 5

def add_10(val):
    return val + 10

operators = [add_1, add_5, add_10]

def main():
    vals = map(compose(operators), reader())
    return vals

print(list(main()))
texasflood
  • 1,571
  • 1
  • 13
  • 22
1

The reason for this problem is that you are creating a deeply nested generator of generators and evaluate the whole thing after the loop, when op has been bound to the last element in the list -- similar to the quite common "lambda in a loop" problem.

In a sense, your code is roughly equivalent to this:

for op in operators:
    pass

print(list((op(val) for val in (op(val) for val in (op(val) for val in (x for x in [1, 2, 3])))))

One (not very pretty) way to fix this would be to zip the values with another generator, repeating the same operation:

def add(n):
    def add_n(val):
        return val + n
    return add_n
operators = [add(n) for n in [1, 5, 10]]

import itertools
def main():
    vals = (x for x in [1, 2, 3])

    for op in operators:
        vals = (op(val) for (val, op) in zip(vals, itertools.repeat(op)))

    return vals

print(list(main()))
Community
  • 1
  • 1
tobias_k
  • 81,265
  • 12
  • 120
  • 179