10

I would like to use itertools.chain for efficient concatenation of lists (memoization), but I need to be able to read (or map, etc.) the result multiple times. This example illustrates the problem:

import itertools
a = itertools.chain([1, 2], [3, 4])
print list(a) # => [1, 2, 3, 4]
print list(a) # => []

What is the best way to avoid this problem?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
jtbandes
  • 115,675
  • 35
  • 233
  • 266

2 Answers2

14

As with all generators, you'll need to convert it to a list and store that result instead:

a = list(a)

This is a fundamental principle of generators, they are expected to produce their sequence only once.

Moreover, you cannot simply store a generator for memoization purposes, as the underlying lists could change. In almost all memoization use-cases, you should store the list instead; a generator is usually only a means of efficiently transforming or filtering the underlying sequences, and does not represent the data you want to memoize itself. It's as if you are storing a function, not it's output. In your specific case, if all what you are doing is using chain() to concatenate existing lists, store those lists directly instead.

Note that this enables generators to produce endless sequences, so be careful with that you convert to a list.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks for the info. Sounds like I should just concatenate the lists. Is there a more efficient way to do so than `list1 + list2 + ...`? – jtbandes Oct 31 '12 at 11:20
  • @jtbandes: You could use `timeit` to compare `list(chain())` with `list1 + ..`, but I *suspect* the latter is going to be the most efficient. – Martijn Pieters Oct 31 '12 at 11:21
4

Try itertools.tee:

import itertools
a = itertools.chain([1, 2], [3, 4])
a, b = itertools.tee(a)
print list(b) # => [1, 2, 3, 4]
a, b = itertools.tee(a)
print list(b) # => [1, 2, 3, 4]
georg
  • 211,518
  • 52
  • 313
  • 390
  • That doesn't change the fact that memoizing a generator is useless, really. – Martijn Pieters Oct 31 '12 at 11:17
  • 4
    From the docs: In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use `list()` instead of `tee()`. – Paolo Moretti Oct 31 '12 at 11:18
  • 3
    @PaoloMoretti: note that the poster didn't ask what is faster or better, their question is how to reuse a generator, and itertools.tee provides exactly that. – georg Oct 31 '12 at 11:21
  • 2
    @thg435 My comment is more of a note for others who might come across the question – Paolo Moretti Oct 31 '12 at 11:24