Concatenating two lists of Strings element wise in Python without Nested for loops

Question

I have two lists of strings: ls1 = ['a','b','c','d'] and ls2 = ['k','j','l','m']

I want to create a 3rd list: ls3 = ['a-k','a-j','a-l','a-m','b-k','b-j','b-l','b-m'...'d-m'] which has 16 elements.

I can achieve this quite easily with the following nested for loops

ls3 = [] 
for elem in ls1:
    for item in ls2:
        ls3.append(elem+'-'+item)

However, this isn't very Pythonic and reveals my C-code background.

I attempted a more Pythonic solution with map and lambda:

[ map(lambda x,y: x+'-'+y, a,b) for a,b in zip(ls1,ls2) ]

But I don't really know what I'm doing yet.

What is a Pythonic way to achieve what I've done with my nested for loops?

Your nested for-loops *are* Pythonic. – juanpa.arrivillaga Apr 13 '17 at 07:48 — juanpa.arrivillaga, Apr 13 '17 at 07:48

Paul Panzer · Answer 1 · 2017-04-13T19:09:37.557

You can use itertools.product together with map:

list(map('-'.join, itertools.product('abcd', 'kjlm')))
# ['a-k', 'a-j', 'a-l', 'a-m', 'b-k', 'b-j', 'b-l', 'b-m', 'c-k', 'c-j', 'c-l', 'c-m', 'd-k', 'd-j', 'd-l', 'd-m']

Test for correctness and timings:

The usual disclaimers for benchmarks apply.

Under the test conditions the above ("product map") solution is faster than the "naive" list comprehension ("naive"), although the margin is small for small problem size.

Much of the speed-up appears to be due to avoiding a list comprehension. Indeed if map is replaced by a list comprehension ("product compr") then product still scales better than the naive approach, but at small problem size falls behind:

small (4x4)
results equal: True True
naive             0.002420 ms
product compr     0.003211 ms
product map       0.002146 ms
large (4x4x4x4x4x4)
results equal: True True
naive             0.836124 ms
product compr     0.681193 ms
product map       0.385240 ms

Benchmark script for reference

import itertools
import timeit

lists = [[chr(97 + 4*i + j) for j in range(4)] for i in range(6)]

print('small (4x4)')
print('results equal:', [x+'-'+y for x in lists[0] for y in lists[1]]
      ==
      list(map('-'.join, itertools.product(lists[0], lists[1]))), end=' ')
print(['-'.join(t) for t in  itertools.product(lists[0], lists[1])]
      ==
      list(map('-'.join, itertools.product(lists[0], lists[1]))))

print('{:16s} {:9.6f} ms'.format('naive', timeit.timeit(lambda: [x+'-'+y for x in lists[0] for y in lists[1]], number=1000)))
print('{:16s} {:9.6f} ms'.format('product compr', timeit.timeit(lambda: ['-'.join(t) for t in itertools.product(lists[0], lists[1])], number=1000)))
print('{:16s} {:9.6f} ms'.format('product map', timeit.timeit(lambda: list(map('-'.join, itertools.product(lists[0], lists[1]))), number=1000)))

print('large (4x4x4x4x4x4)')
print('results equal:', ['-'.join((u, v, w, x, y, z)) for u in lists[0] for v in lists[1] for w in lists[2] for x in lists[3] for y in lists[4] for z in lists[5]]
      ==
      list(map('-'.join, itertools.product(*lists))), end=' ')
print(['-'.join(t) for t in  itertools.product(*lists)]
      ==
      list(map('-'.join, itertools.product(*lists))))

print('{:16s} {:9.6f} ms'.format('naive', timeit.timeit(lambda: ['-'.join((u, v, w, x, y, z)) for u in lists[0] for v in lists[1] for w in lists[2] for x in lists[3] for y in lists[4] for z in lists[5]], number=1000)))
print('{:16s} {:9.6f} ms'.format('product compr', timeit.timeit(lambda: ['-'.join(t) for t in  itertools.product(*lists)], number=1000)))
print('{:16s} {:9.6f} ms'.format('product map', timeit.timeit(lambda: list(map('-'.join, itertools.product(*lists))), number=1000)))

This quite unnecessarily uses a library module to provide a solution that will work more quickly using only built-in language constructs (though it does have the merit of answering the OP's question of how to avoid nested loops, it does so at significant cost). — holdenweb, Apr 13 '17 at 09:40
I think @holdenweb is being unnecessarily critical. This solution is faster than the list comprehension approach, makes it explicitly clear that it's generating a product of the inputs, and can be easily generalized to any number of inputs. I think both answers teach something useful. — Matthias Fripp, Apr 13 '17 at 19:19
Ah, a voice of moderation. Thank you @mfripp, your comment is most welcome. — Paul Panzer, Apr 13 '17 at 19:44
Criticism does not necessarily imply disapproval. @PaulPanzer has clearly demonstrated that I was wrong to assert the use of a library module was "quite unnecessary." More credit to him for the extra work he put in. I have withdrawn my down vote. — holdenweb, Apr 14 '17 at 10:34
@holdenweb *"Criticism does not necessarily imply disapproval."* It usually does when it is reinforced by a downvote. Anyway, thanks for coming back on this, it is appreciated. — Paul Panzer, Apr 14 '17 at 11:22

holdenweb · Accepted Answer · 2017-04-18T05:20:10.177

The technique you have used is perfectly Pythonic, and until list comprehensions were introduced into the language would have been canonical. The one you suggest using zip, however, won't work, because you want all pairs of elements from ls1 and ls2, but zip simply creates pairs using the corresponding elements rather than all combinations.

If you'd like to use more compact code then the appropriate list comprehension would be

ls3 = [x+'-'+y for x in ls1 for y in ls2]

For large lists, or where you need every ounce of performance (which should never be your first consideration) see the answer from @PaulPanzer, who explains a more efficient though slightly more complex technique.

Concatenating two lists of Strings element wise in Python without Nested for loops

2 Answers2

Linked