I'm a bit new to functional in Python...
scenario
Here's the general problem. Suppose I have an input that I want to read through 1 time only. Let's say it is really big. Maybe I have a lot of filters, transformations, reductions, whatever to do on this stream. At the end, I want to produce a list of modest size and hand it off to something else as the result of my analysis.
If I want to create a single list, I'm in good shape. I will encode the above logic as operations on an iterable and provide these to a pipeline of tools like filter(), etc. This, I will give to a list comprehension which will efficiently build the resulting list.
But, what if my modest requirement is that I want two lists as output? For instance, I want all the 'falses' (of some question) in one list and all the 'trues' in another list. Or maybe I want three lists...
inadequate solutions
In this case, I take it I have two options:
- iterate my input to generate the first list, save that output and then iterate again to produce the second output list
- create two empty lists, manually iterate my pipeline's output, append to my lists according to my needs at each step
Both of those options stink in comparison with the list comprehension. One makes multiple passes and the second one calls append() repeatedly which (I guess I assume) is slow and is a construct living in arbitrary python instead of a clean, optimizable single statement.
existing modules?
I've looked through modules itertools and collections and peeked a bit at numpy. I see some things that might do the above, but their documentation explains that they are a convenience function and will result in buffering, etc. so they don't meet my requirements.
I love Python functional style, iterators and generators. I feel I've got a decent understanding of the benefits of iterators, even as they relate to inputs that are not files. I appreciate the subtle difficulties (e.g. buffering) that might arise from reading from multiple iterators simultaneously, when some may be 'slow inputs' and others 'fast inputs'.
In my case, I just want to consume 1 iterator. This sort of situation has arisen for me multiple times in the last few years.
Concluding with an example
# python 3
# Toy example. Just for reading, not worth running
import random
import itertools
num_samples = 1000000
least_favorite_number = 98
def source(count):
for _ in range(count):
yield random.randint(1, 100)
def my_functional_process(stream):
""" Do silly things to an input iterable of ints, return an iterable of int pairs"""
# Remove the hated number
stream = itertools.filterfalse(lambda x: x == least_favorite_number, stream)
# For each number, take note of which number preceded it in the stream
def note_ancestor(l):
prec = None
for x in l:
yield x, prec
prec = x
stream = note_ancestor(stream)
# I don't like it even when you and your ancestor add up to our
# least favorite number or if you have no ancestor
stream = itertools.filterfalse(
lambda x: x[1] is None or x[0] + x[1] == least_favorite_number,
stream
)
# Good job
return stream
def single_pass_the_slow_way():
"""
Read through the iterator in a single pass, but build result in a way that I think is slow
"""
the_fours = []
not_fours = []
stream = source(num_samples)
processed = my_functional_process(stream)
for x in processed:
if x[0] == 4:
the_fours.append(x)
else:
not_fours.append(x)
return the_fours, not_fours
def single_pass_and_fast():
"""
In this function, we make a single pass but create multiple lists using
imaginary syntax.
"""
stream = source(num_samples)
processed = my_functional_process(stream)
# In my dream, Python figures out to run these comprehensions in parallel
# In reality, is there even a syntax to represent this?? Obviously, the
# below does not do it
not_real_code = [
# just making up syntax here
# [x for x in ~x~ if x == 4],
# [x for x in ~x~ if x != 4]
x for x in processed
]
# These should be a list of fours, and all others respectively
return not_real_code[0], not_real_code[1]
i_want_it = 'slow'
if i_want_it == 'slow':
fours, others = single_pass_the_slow_way()
print("We're done. ready to use those lists")
else:
fours, others = single_pass_and_fast()
print("We're done a bit faster. ready to use those lists")