Naive solution
Straightforward algorithm would be
- Perform elementwise copying of original nested iterable.
- Make
n
copies of elementwise copy.
- Obtain coordinates related to each independent copy.
which may be implemented like
from itertools import tee
from operator import itemgetter
from typing import (Any,
Iterable,
Tuple,
TypeVar)
Domain = TypeVar('Domain')
def copy_nested_iterable(nested_iterable: Iterable[Iterable[Domain]],
*,
count: int = 2
) -> Tuple[Iterable[Iterable[Domain]], ...]:
def shallow_copy(iterable: Iterable[Domain]) -> Tuple[Iterable[Domain], ...]:
return tee(iterable, count)
copies = shallow_copy(map(shallow_copy, nested_iterable))
return tuple(map(itemgetter(index), iterables)
for index, iterables in enumerate(copies))
Pros:
- quite easy to read & explain.
Cons:
- if we wanted to extend our approach for iterables with greater nesting level (like iterable of nested iterables and so on) this approach doesn't look helpful.
We can do better.
Improved solution
If we look at itertools.tee
function documentation, it contains Python recipe, which with help of functools.singledispatch
decorator can be rewritten like
from collections import (abc,
deque)
from functools import singledispatch
from itertools import repeat
from typing import (Iterable,
Tuple,
TypeVar)
Domain = TypeVar('Domain')
@functools.singledispatch
def copy(object_: Domain,
*,
count: int) -> Iterable[Domain]:
raise TypeError('Unsupported object type: {type}.'
.format(type=type(object_)))
# handle general case
@copy.register(object)
# immutable strings represent a special kind of iterables
# that can be copied by simply repeating
@copy.register(bytes)
@copy.register(str)
# mappings cannot be copied as other iterables
# since they are iterable only by key
@copy.register(abc.Mapping)
def copy_object(object_: Domain,
*,
count: int) -> Iterable[Domain]:
return itertools.repeat(object_, count)
@copy.register(abc.Iterable)
def copy_iterable(object_: Iterable[Domain],
*,
count: int = 2) -> Tuple[Iterable[Domain], ...]:
iterator = iter(object_)
# we are using `itertools.repeat` instead of `range` here
# due to efficiency of the former
# more info at
# https://stackoverflow.com/questions/9059173/what-is-the-purpose-in-pythons-itertools-repeat/9098860#9098860
queues = [deque() for _ in repeat(None, count)]
def replica(queue: deque) -> Iterable[Domain]:
while True:
if not queue:
try:
element = next(iterator)
except StopIteration:
return
element_copies = copy(element,
count=count)
for sub_queue, element_copy in zip(queues, element_copies):
sub_queue.append(element_copy)
yield queue.popleft()
return tuple(replica(queue) for queue in queues)
Pros:
- handles nesting on deeper levels or even mixed elements like both iterables and non-iterables on the same level,
- may be extended for user-defined structures (e.g. for making independent deep copies of them).
Cons:
- less readable (but as we know "practicality beats purity"),
- provides some overhead related to dispatching (but it's ok since it is based on dictionary lookup which has
O(1)
complexity).
Test
Preparation
Let's define our nested iterable as follows
nested_iterable = [range(10 ** index) for index in range(1, 7)]
Since iterators creation says nothing about underlying copies performance, let's define function for iterators exhausting (described here)
exhaust_iterable = deque(maxlen=0).extend
Time
Using timeit
package
import timeit
def naive(): exhaust_iterable(copy_nested_iterable(nested_iterable))
def improved(): exhaust_iterable(copy_iterable(nested_iterable))
print('naive approach:', min(timeit.repeat(naive)))
print('improved approach:', min(timeit.repeat(improved)))
I have on my laptop with Windows 10 x64 in Python 3.5.4
naive approach: 5.1863865
improved approach: 3.5602296000000013
Memory
Using memory_profiler
package
Line # Mem usage Increment Line Contents
================================================
78 17.2 MiB 17.2 MiB @profile
79 def profile_memory(nested_iterable: Iterable[Iterable[Any]]) -> None:
80 68.6 MiB 51.4 MiB result = list(flatten(flatten(copy_nested_iterable(nested_iterable))))
for "naive" approach and
Line # Mem usage Increment Line Contents
================================================
78 17.2 MiB 17.2 MiB @profile
79 def profile_memory(nested_iterable: Iterable[Iterable[Any]]) -> None:
80 68.7 MiB 51.4 MiB result = list(flatten(flatten(copy_iterable(nested_iterable))))
for "improved" one.
Note: I've made different runs of script because making them at once won't be representative since second statement will reuse previously created under-the-hood int
objects.
Conclusion
As we can see both functions have similar performance, but the last one supports deeper levels of nesting and looks pretty extensible.
Advertisement
I've added "improved" solution to lz
package from 0.4.0
version which can be used like
>>> from lz.replication import replicate
>>> iterable = iter(range(5))
>>> list(map(list, replicate(iterable,
count=3)))
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
It is property-based tested using hypothesis
framework, so we may be sure that it works as expected.