What is the difference between chain and chain.from_iterable in itertools?

Question

I could not find any valid example on the internet where I can see the difference between them and why to choose one over the other.

Martijn Pieters · Accepted Answer · 2019-06-24T01:12:17.747

98

The first takes 0 or more arguments, each an iterable, the second one takes one argument which is expected to produce the iterables:

from itertools import chain

chain(list1, list2, list3)

iterables = [list1, list2, list3]
chain.from_iterable(iterables)

but iterables can be any iterator that yields the iterables:

def gen_iterables():
    for i in range(10):
        yield range(i)

itertools.chain.from_iterable(gen_iterables())

Using the second form is usually a case of convenience, but because it loops over the input iterables lazily, it is also the only way you can chain an infinite number of finite iterators:

def gen_iterables():
    while True:
        for i in range(5, 10):
            yield range(i)

chain.from_iterable(gen_iterables())

The above example will give you a iterable that yields a cyclic pattern of numbers that will never stop, but will never consume more memory than what a single range() call requires.

edited Jun 24 '19 at 01:12

answered Feb 21 '13 at 14:30

Martijn Pieters

1,048,767
296
4,058
3,343

4

i still can't get it. can you give me the output differnce and use case in practical situation where to use what – user1994660 Feb 21 '13 at 23:18
16

@user1994660: there is no output difference. It's an *input* difference. It makes it easier to use certain inputs. – Martijn Pieters Feb 21 '13 at 23:51
@user1994660: I use the second form in [this answer](http://stackoverflow.com/questions/12900444/trying-to-add-to-dictionary-values-by-counting-occurrences-in-a-list-of-lists-p/12900577#12900577). – Martijn Pieters Feb 21 '13 at 23:52
@user1994660: good usecase for the second form: [Python idiom to chain (flatten) an infinite iterable of finite iterables?](http://stackoverflow.com/a/120886) – Martijn Pieters Feb 22 '13 at 00:58
1

@user1994660: Run this code: `# Return an iterator of iterators` `def it_it(): return iter( [iter( [11, 22] ), iter( [33, 44] )] )` `print( list(itertools.chain.from_iterable(it_it())) )` `print( list(itertools.chain(it_it())) )` `print( list(itertools.chain(*it_it())) )` The first one is best; the second one doesn't get at the nested iterators, it returns iterators, instead of the desired numbers; the third one produces the correct output BUT it isn't fully lazy: the "*" forced all the iterators to be created. For this dumb input that doesn't matter. – ToolmakerSteve Dec 20 '13 at 19:21
2

Note that if the iterables is not too big, you can also do `itertools.chain(*iterables)` – balki Feb 14 '14 at 21:14
@MartijnPieters would you mind if I quoted you on *"the only way you can chain a infinite number of finite iterators"*? – Ryan Haining Jun 09 '14 at 00:55
I recently added chain.from_iterable to [a c++ project](https://github.com/ryanhaining/cppitertools) – Ryan Haining Jun 09 '14 at 01:08

pylang · Answer 2 · 2018-07-17T01:04:17.467

I could not find any valid example ... where I can see the difference between them [chain and chain.from_iterable] and why to choose one over the other

The accepted answer is thorough. For those seeking a quick application, consider flattening several lists:

list(itertools.chain(["a", "b", "c"], ["d", "e"], ["f"]))
# ['a', 'b', 'c', 'd', 'e', 'f']

You may wish to reuse these lists later, so you make an iterable of lists:

iterable = (["a", "b", "c"], ["d", "e"], ["f"])

Attempt

However, passing in an iterable to chain gives an unflattened result:

list(itertools.chain(iterable))
# [['a', 'b', 'c'], ['d', 'e'], ['f']]

Why? You passed in one item (a tuple). chain needs each list separately.

Solutions

When possible, you can unpack an iterable:

list(itertools.chain(*iterable))
# ['a', 'b', 'c', 'd', 'e', 'f']

list(itertools.chain(*iter(iterable)))
# ['a', 'b', 'c', 'd', 'e', 'f']

More generally, use .from_iterable (as it also works with infinite iterators):

list(itertools.chain.from_iterable(iterable))
# ['a', 'b', 'c', 'd', 'e', 'f']

g = itertools.chain.from_iterable(itertools.cycle(iterable))
next(g)
# "a"

score 9 · Answer 3 · answered Jul 10 '17 at 22:25

9

They do very similar things. For small number of iterables itertools.chain(*iterables) and itertools.chain.from_iterable(iterables) perform similarly.

The key advantage of from_iterables lies in the ability to handle large (potentially infinite) number of iterables since all of them need not be available at the time of the call.

answered Jul 10 '17 at 22:25

BiGYaN

6,974
5
30
43

1

Does anyone know if the `*` operator unpacks `iterables` lazily? – Rotareti Mar 25 '18 at 15:06
1

@Rotareti, yes it does unpack lazily (one at a time) but in this case `itertools.chain(*iterables)` is a function call. All arguments must be present at the time of call. – BiGYaN Mar 27 '18 at 07:30
Is this true? From the CPython code, it seems to be the same https://stackoverflow.com/a/62513808/610569 – alvas Jun 22 '20 at 12:49
@alvas Try changing the number of elements to very large; in the range of 10_000 to 1_000_000 and you will see the `from_iterables` becoming faster. – BiGYaN Jun 29 '20 at 22:38

alvas · Answer 4 · 2020-06-22T14:47:58.537

Extending @martijn-pieters answer

Although the access to the inner items in the iterable remains the same, and implementation wise,

itertools_chain_from_iterable (i.e. chain.from_iterable in Python) and
chain_new (i.e. chain in Python)

in the CPython implementation, are both duck-types of chain_new_internal

Are there any optimization benefits from using chain.from_iterable(x), where x is an iterable of iterable; and the main purpose is to ultimately consume the flatten list of items?

We can try benchmarking it with:

import random
from itertools import chain
from functools import wraps
from time import time

from tqdm import tqdm

def timing(f):
    @wraps(f)
    def wrap(*args, **kw):
        ts = time()
        result = f(*args, **kw)
        te = time()
        print('func:%r args:[%r, %r] took: %2.4f sec' % (f.__name__, args, kw, te-ts))
        return result
    return wrap

def generate_nm(m, n):
    # Creates m generators of m integers between range 0 to n.
    yield iter(random.sample(range(n), n) for _ in range(m))
    

def chain_star(x):
    # Stores an iterable that will unpack and flatten the list of list.
    chain_x = chain(*x)
    # Consumes the items in the flatten iterable.
    for i in chain_x:
        pass

def chain_from_iterable(x):
    # Stores an iterable that will unpack and flatten the list of list.
    chain_x = chain.from_iterable(x)
    # Consumes the items in the flatten iterable.
    for i in chain_x:
        pass


@timing
def versus(f, n, m):
  f(generate_nm(n, m))

P/S: Benchmark running... Waiting for the results.

Results

chain_star, m=1000, n=1000

for _ in range(10):
    versus(chain_star, 1000, 1000)

[out]:

func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6494 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6603 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6367 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6350 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6296 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6399 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6341 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6381 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6343 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 1000, 1000), {}] took: 0.6309 sec

chain_from_iterable, m=1000, n=1000

for _ in range(10):
    versus(chain_from_iterable, 1000, 1000)

[out]:

func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6416 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6315 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6535 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6334 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6327 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6471 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6426 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6287 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6353 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 1000, 1000), {}] took: 0.6297 sec

chain_star, m=10000, n=1000

func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2659 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2966 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2953 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.3141 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2802 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2799 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2848 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.3299 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.2730 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 10000, 1000), {}] took: 6.3052 sec

chain_from_iterable, m=10000, n=1000

func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.3129 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.3064 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.3071 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2660 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2837 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2877 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2756 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2939 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2715 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 10000, 1000), {}] took: 6.2877 sec

chain_star, m=100000, n=1000

func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.7874 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.3744 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.5584 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.3745 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.7982 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.4054 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.6769 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.6476 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 63.7397 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 100000, 1000), {}] took: 62.8980 sec

chain_from_iterable, m=100000, n=1000

for _ in range(10):
    versus(chain_from_iterable, 100000, 1000)

[out]:

func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7227 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7717 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7159 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7569 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7906 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.6211 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.7294 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.8260 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.8356 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 100000, 1000), {}] took: 62.9738 sec

chain_star, m=500000, n=1000

for _ in range(3):
    versus(chain_from_iterable, 500000, 1000)

[out]:

func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 500000, 1000), {}] took: 314.5671 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 500000, 1000), {}] took: 313.9270 sec
func:'versus' args:[(<function chain_star at 0x7f5c7188ef28>, 500000, 1000), {}] took: 313.8992 sec

chain_from_iterable, m=500000, n=1000

for _ in range(3):
    versus(chain_from_iterable, 500000, 1000)

[out]:

func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 500000, 1000), {}] took: 313.8301 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 500000, 1000), {}] took: 313.8104 sec
func:'versus' args:[(<function chain_from_iterable at 0x7f5c7188eb70>, 500000, 1000), {}] took: 313.9440 sec

score 5 · Answer 5 · answered Aug 20 '18 at 10:34

Another way to see it:

chain(iterable1, iterable2, iterable3, ...) is for when you already know what iterables you have, so you can write them as these comma-separated arguments.

chain.from_iterable(iterable) is for when your iterables (like iterable1, iterable2, iterable3) are obtained from another iterable.

score 0 · Answer 6 · edited Jan 13 '23 at 12:05

0

Another way to look at it is to use chain.from_iterable

when you have an iterable of iterables like a nested iterable(or a compound iterbale) and use chain for simple iterables.

edited Jan 13 '23 at 12:05

NelsonGon

13,015
7
27
57

answered Feb 20 '20 at 14:42

Chyanit Singh

105
8

What is the difference between chain and chain.from_iterable in itertools?

6 Answers6

Results

chain_star, m=1000, n=1000

chain_from_iterable, m=1000, n=1000

chain_star, m=10000, n=1000

chain_from_iterable, m=10000, n=1000

chain_star, m=100000, n=1000

chain_from_iterable, m=100000, n=1000

chain_star, m=500000, n=1000

chain_from_iterable, m=500000, n=1000

Linked