268

I want to change the following code

for directory, dirs, files in os.walk(directory_1):
    do_something()

for directory, dirs, files in os.walk(directory_2):
    do_something()

to this code:

for directory, dirs, files in os.walk(directory_1) + os.walk(directory_2):
    do_something()

I get the error:

unsupported operand type(s) for +: 'generator' and 'generator'

How to join two generators in Python?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Homer Xing
  • 2,681
  • 2
  • 16
  • 3

15 Answers15

349

itertools.chain() should do it. It takes multiple iterables and yields from each one by one, roughly equivalent to:

def chain(*iterables):
    for it in iterables:
        for element in it:
            yield element

Usage example:

from itertools import chain

g = (c for c in 'ABC')  # Dummy generator, just for example
c = chain(g, 'DEF')  # Chain the generator and a string
for item in c:
    print(item)

Output:

A
B
C
D
E
F
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Philipp
  • 48,066
  • 12
  • 84
  • 109
  • 15
    One should keep in mind that the return value of `itertools.chain()` does not return a `types.GeneratorType` instance. Just in case the exact type is crucial. – Riga Sep 16 '19 at 13:45
  • 1
    See @andrew-pate anser for [itertools.chain.from_iterable()](https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable) reference to return a types.GeneratorType instance. – gkedge Sep 09 '20 at 13:13
  • itertools.chain() would give all the elements in one directory and then shift to the other directory. Now, how do we pick the first elements of both directories and perform some operations, and then shift to the next pair and so on? Any idea would be appreciated. – yash Feb 09 '21 at 11:07
  • 1
    @yash Iterate over those directories manually using the built-in function [next](https://docs.python.org/3/library/functions.html#next). – Jeyekomon Feb 16 '21 at 08:57
  • 2
    @yash you might like [zip](https://docs.python.org/3/library/functions.html#zip). It does precisely that, pick out the first, second etc. values and put them in tuples. – Randelung May 18 '21 at 21:00
114

A example of code:

from itertools import chain

def generator1():
    for item in 'abcdef':
        yield item

def generator2():
    for item in '123456':
        yield item

generator3 = chain(generator1(), generator2())
for item in generator3:
    print item
Cesio
  • 1,157
  • 1
  • 7
  • 2
88

In Python (3.5 or greater) you can do:

def concat(a, b):
    yield from a
    yield from b
Benoît P
  • 3,179
  • 13
  • 31
Uduse
  • 1,491
  • 1
  • 13
  • 18
  • 16
    So much pythonic. – ramazan polat Oct 28 '18 at 18:18
  • 24
    More general: `def chain(*iterables): for iterable in iterables: yield from iterable` (Put the `def` and `for` on separate lines when you run it.) – wjandrea Apr 12 '19 at 15:29
  • 2
    Is everything from *a* yielded before anything from *b* is yielded or are they being alternated? – problemofficer - n.f. Monica Dec 10 '19 at 16:22
  • 3
    @problemofficer Yup. Only `a` is checked until everything is yielded from it, even if `b` isn't an iterator. The `TypeError` for `b` not being an iterator will come up later. – GeeTransit Jan 04 '20 at 16:10
  • @wjandrea just for curiosity of efficiency: If you have iterables and use * to unpack so it can be used in this function, then it will cause lower efficiency compared to iterating of arg which is an iterable type, eg. `def f(): return tuple(x for x in chain(*(str(s).split() for s in range(10000))))` when change your func to `def chain(iterable): ... ` and passing iterables and one arg instead of *args- the difference using %timeit f() is 5.89 ms ± 122µs vs 4.5 ms ± 67.3µs per loop (mean ± std. dev. of 7 runs, 100 loops each). – Karolius Jul 15 '22 at 11:33
  • @Karolius Sorry, I'm not sure what you're talking about. I never said anything about unpacking, and you can't substitute `*iterables` with `iterable` in the function I wrote. – wjandrea Jul 15 '22 at 15:05
  • @wjandrea instead of passing it like: `chian(a, b)`, you can pass it as `chain((a,b))` so when using it with iterable of things to chain you don't have to unpack it like `chian(*(a,b))`, am i wrong? It's just a note, not so important and depends on where you want to use it. – Karolius Jul 16 '22 at 23:38
  • 1
    @Karolius Oh OK, I see what you're saying. It looks like you made a typo, which confused me: `def chain(iterable)` should be `def chain(iterables)`. (Also, `x for x in` is redundant.) Anyway, there's already a tool in the stdlib that does that: [`itertools.chain.from_iterable`](https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable). And beyond performance, if you had an *infinite* iterable of iterables, it *wouldn't be possible* to use unpacking. – wjandrea Jul 17 '22 at 01:22
  • This should be the top answer! – N. Jonas Figge Jun 28 '23 at 12:28
41

Simple example:

from itertools import chain
x = iter([1,2,3])      #Create Generator Object (listiterator)
y = iter([3,4,5])      #another one
result = chain(x, y)   #Chained x and y
Community
  • 1
  • 1
user1767754
  • 23,311
  • 18
  • 141
  • 164
14

With itertools.chain.from_iterable you can do things like:

def genny(start):
  for x in range(start, start+3):
    yield x

y = [1, 2]
ab = [o for o in itertools.chain.from_iterable(genny(x) for x in y)]
print(ab)
djvg
  • 11,722
  • 5
  • 72
  • 103
andrew pate
  • 3,833
  • 36
  • 28
  • 1
    You're using an unnecessary list comprehension. You're also using an unnecessary generator expression on `genny` when it already returns a generator. `list(itertools.chain.from_iterable(genny(x)))` is much more concise. – Corman May 25 '20 at 20:31
  • The !ist comprehension was an easy way to create the two generators, as per the question. Maybe my answer is a little convoluted in that respect. – andrew pate May 26 '20 at 22:29
  • 1
    I guess the reason I added this answer to the existing ones was to help those who happen to have lots of generators to deal with. – andrew pate May 26 '20 at 22:41
  • It isn't an easy way, there are many easier ways. Using generator expressions on an existing generator will lower performance, and the `list` constructor is much more readable then the list comprehension. Your method is much more unreadable in those regards. – Corman May 27 '20 at 00:48
  • Corman, I agree your list constructor is indeed more readable. It would be good to see your 'many easier ways' though ... I think wjandrea's comment above looks to do the same as itertools.chain.from_iterable it would be good to race them and see whos fastest. – andrew pate May 27 '20 at 09:25
  • The two easier ways, as mentioned before, are using `list` and `genny(x)` over a list comprehension and a generator. The speed race would almost certainly favor the list comprehension because you're doing less computations. – Corman May 27 '20 at 14:13
13

Here it is using a generator expression with nested fors:

range_a = range(3)
range_b = range(5)
result = (item
    for one_range in (range_a, range_b)
    for item in one_range)
assert list(result) == [0, 1, 2, 0, 1, 2, 3, 4]

The for ... in ... are evaluated left-to-right. The identifier after for establishes a new variable. While one_range in used in the following for ... in ..., the item from the second one is used in the „final” assignment expression of which there is only one (in the very beginning).

Related question: How do I make a flat list out of a list of lists?.

Alexey
  • 3,843
  • 6
  • 30
  • 44
8

2020 update: Work in both Python 3 and Python 2

import itertools

iterA = range(10,15)
iterB = range(15,20)
iterC = range(20,25)

first option

for i in itertools.chain(iterA, iterB, iterC):
    print(i)

# 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

alternative option, introduced in python 2.6

for i in itertools.chain.from_iterable( [iterA, iterB, iterC] ):
    print(i)

# 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

itertools.chain() is the basic.

itertools.chain.from_iterable() is handy if you have an iterable of iterables. For example a list of files per subdirectory like [ ["src/server.py", "src/readme.txt"], ["test/test.py"] ].

wjandrea
  • 28,235
  • 9
  • 60
  • 81
user5994461
  • 5,301
  • 1
  • 36
  • 57
3

One can also use unpack operator *:

concat = (*gen1(), *gen2())

NOTE: Works most efficiently for 'non-lazy' iterables. Can also be used with different kind of comprehensions. Preferred way for generator concat would be from the answer from @Uduse

sol25
  • 139
  • 1
  • 8
  • It's sad that there is no lazy evaluation of *generator, because it would have made this a marvelous solution... – Camion Oct 13 '20 at 01:43
  • 7
    –1 this will immediately consume both generators into a tuple! – wim Oct 13 '20 at 02:01
2

If you want to keep the generators separate but still iterate over them at the same time you can use zip():

NOTE: Iteration stops at the shorter of the two generators

For example:

for (root1, dir1, files1), (root2, dir2, files2) in zip(os.walk(path1), os.walk(path2)):

    for file in files1:
        #do something with first list of files

    for file in files2:
        #do something with second list of files
DivideByZero
  • 131
  • 2
  • 11
2

(Disclaimer: Python 3 only!)

Something with syntax similar to what you want is to use the splat operator to expand the two generators:

for directory, dirs, files in (*os.walk(directory_1), *os.walk(directory_2)):
    do_something()

Explanation:

This effectively performs a single-level flattening of the two generators into an N-tuple of 3-tuples (from os.walk) that looks like:

((directory1, dirs1, files1), (directory2, dirs2, files2), ...)

Your for-loop then iterates over this N-tuple.

Of course, by simply replacing the outer parentheses with brackets, you can get a list of 3-tuples instead of an N-tuple of 3-tuples:

for directory, dirs, files in [*os.walk(directory_1), *os.walk(directory_2)]:
    do_something()

This yields something like:

[(directory1, dirs1, files1), (directory2, dirs2, files2), ...]

Pro:

The upside to this approach is that you don't have to import anything and it's not a lot of code.

Con:

The downside is that you dump two generators into a collection and then iterate over that collection, effectively doing two passes and potentially using a lot of memory.

Milosz
  • 2,924
  • 3
  • 22
  • 24
  • This is not flattening at all. Rather, it is a [zip](https://docs.python.org/3.8/library/functions.html#zip). – jpaugh Apr 11 '21 at 11:37
  • 2
    A bit puzzled by your comment @jpaugh. This concatenates two iterables. It doesn't create pairs from them. Maybe the confusion is from the fact that os.walk already yields 3-tuples? – Milosz Apr 12 '21 at 04:36
1

I would say that, as suggested in comments by user "wjandrea", the best solution is

def concat_generators(*gens):
    for gen in gens:
        yield from gen

It does not change the returned type and is really Pythonic.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Luca Di Liello
  • 1,486
  • 2
  • 17
  • 34
  • Which is what [itertools.chain.from_iterable()](https://docs.python.org/3/library/itertools.html#itertools.chain.from_iterable) will do for you. See @andrew-pate 's [answer](https://stackoverflow.com/a/34809895/2973538). – gkedge Sep 09 '20 at 13:16
  • Don't reinvent the wheel, use `itertools.chain`. My comment wasn't meant to suggest "the best solution", it was just to improve a mediocre solution. Anyway, you also changed the names and made them confusing: `concat_generators` can work on any [iterable](https://docs.python.org/3/glossary.html#term-iterable), not just [generator](https://docs.python.org/3/glossary.html#term-generator)s, so it should be renamed along with `gen`; and `args` is vague, so I'd use `iterables` instead (or `gens`, following your incorrect naming scheme). – wjandrea Jul 17 '22 at 01:42
  • Oops, actually, I take most of that back. **If you're using generator-specific features**, like `.send()`, `.throw()`, and `.close()`, then this is the better solution because it actually lets you use them, which `itertools.chain` doesn't. But in OP's case, they're not using any of those features, so it's simpler to use `chain`. (Also, I should have linked [generator iterator](https://docs.python.org/3/glossary.html#term-generator-iterator) instead of *generator*. The glossary is arguably wrong for this term.) – wjandrea Jul 17 '22 at 02:23
0

Lets say that we have to generators (gen1 and gen 2) and we want to perform some extra calculation that requires the outcome of both. We can return the outcome of such function/calculation through the map method, which in turn returns a generator that we can loop upon.

In this scenario, the function/calculation needs to be implemented via the lambda function. The tricky part is what we aim to do inside the map and its lambda function.

General form of proposed solution:

def function(gen1,gen2):
        for item in map(lambda x, y: do_somethin(x,y), gen1, gen2):
            yield item
Mahdi Ghelichi
  • 1,090
  • 14
  • 23
0

If you would like get list of files paths from a knows directories before and after, you can do this:

for r,d,f in os.walk(current_dir):
    for dir in d:
        if dir =='after':
                after_dir = os.path.abspath(os.path.join(current_dir, dir))
                for r,d,f in os.walk(after_dir): 
                    after_flist.append([os.path.join(r,file)for file in f if file.endswith('json')])
                              
        elif dir =='before': 
                before_dir = os.path.abspath(os.path.join(current_dir, dir))
                for r,d,f in os.walk(before_dir):
                    before_flist.append([os.path.join(r,file)for file in f if file.endswith('json')])

I know there are better answers, this is simple code I felt.

-1

You can put any generator into a list. And while you can't combine generators, you can combine lists. The cons of this is you actually created 3 lists in memory but the pros are that this is very readable, requires no imports, and is a single line idiom.

Solution for the OP.

for directory, dirs, files in list(os.walk(directory_1)) + list(os.walk(directory_2)):
    do_something()
a = range(20)
b = range(10,99,3)
for v in list(a) + list(b):
    print(v) 
Tatarize
  • 10,238
  • 4
  • 58
  • 64
-2

If you just need to do it once and do not wish to import one more module, there is a simple solutions...

just do:

for dir in directory_1, directory_2:
    for directory, dirs, files in os.walk(dir):
        do_something()

If you really want to "join" both generators, then do :

for directory, dirs, files in (
        x for osw in [os.walk(directory_1), os.walk(directory_2)] 
               for x in osw
        ):
    do_something()
Camion
  • 1,264
  • 9
  • 22
  • The second snippet of code gives an indentation error. It can be fixed with surrounding the list comprehension with parentheses: the opening parenthesis should be on the same line as `in` and the closing after the list comp ends. Regardless of this error, I think this is a bad example to follow. It reduces readability by mixing up indentation. The `itertools.chain` answers are massively more readable and easier to use. – shynjax287 Oct 08 '20 at 18:41
  • You don't need to add parenthesis. I just moved the opening bracket on the previous line to solve this. by the way, you may not like my example, but I still think it's a good idea to know how to do things by yourself, because it makes you able to write the library yourself instead of resorting to someone else's work when you need it. – Camion Oct 13 '20 at 01:39
  • sure, it is a good idea to learn how to do things by yourself. I never debated that. Sorry if I was unclear. The use of a list comprehension here reduces readability and is not really needed. List comprehensions are cool, long list comprehensions become hard to read & fix. The code could be improved by creating the list before and then iterating over it. Sorry about my parenthesis comment if it was incorrect. – shynjax287 Oct 13 '20 at 01:52