Only using items from one list once in nested list comprehension

Question

I'm trying to use list comprehension to generate a new list that consists of a letter taken from a list1 directly followed (after a colon) by the words from list2 that start with that particular letter. I managed to code this using nested for loops as following:

list1=["A","B"]
list2=["Apple","Banana","Balloon","Boxer","Crayons","Elephant"]

newlist=[]
for i in list1:
    newlist.append(i+":")
    for j in list2:
        if j[0]==i:
            newlist[-1]+=j+","

resulting in the intended result: ['A:Apple,', 'B:Banana,Balloon,Boxer,']

Trying the same using list comprehension, I came up with the following:

list1=["A","B"]
list2=["Apple","Banana","Balloon","Boxer","Crayons","Elephant"]

newlist=[i+":"+j+"," for i in list1 for j in list2 if i==j[0]]

resulting in: ['A:Apple,', 'B:Banana,', 'B:Balloon,', 'B:Boxer,']

In which each time a word with that starting letter is found, a new item is created in newlist, while my intention is to have one item per letter.

Is there a way to edit the list comprehension code in order to obtain the same result as using the nested for loops?

Martijn Pieters · Accepted Answer · 2015-11-29T22:39:06.340

All you need to do is to remove the second for loop and replace it with a ','.join(matching_words) call where you use j now in the string concatenation now:

newlist = ['{}:{}'.format(l, ','.join([w for w in list2 if w[0] == l])) for l in list1]

This isn't very efficient; you loop over all the words in list2 for each letter. To do this efficiently, you would be better of to preprocess the lists into a dictionary:

list2_map = {}
for word in list2:
    list2_map.setdefault(word[0], []).append(word)

newlist = ['{}:{}'.format(l, ','.join(list2_map.get(l, []))) for l in list1]

The first loop builds a dictionary mapping initial letter to a list of words, so that you can directly use those lists instead of using a nested list comprehension.

Demo:

>>> list1 = ['A', 'B']
>>> list2 = ['Apple', 'Banana', 'Balloon', 'Boxer', 'Crayons', 'Elephant']
>>> list2_map = {}
>>> for word in list2:
...     list2_map.setdefault(word[0], []).append(word)
...
>>> ['{}:{}'.format(l, ','.join(list2_map.get(l, []))) for l in list1]
['A:Apple', 'B:Banana,Balloon,Boxer']

The above algorithm loops twice through all of list2, and once through list1, making this a O(N) linear algorithm (adding a single word to list2 or a single letter to list1 increases the amount of time with a constant amount). Your version loops over list2 once for every letter in list1, making it a O(NM) algorithm, creating increasing the amount of time it takes exponentially whenever you add a letter or word.

To put that into numbers, if you expanded list1 to cover all 26 ASCII uppercase letters and expanded list2 to contain 1000 words, your approach (scanning all of list2 for words with a given letter) would make 26000 steps. My version, including pre-building the map, takes only 2026 steps. With list2 containing 1 million words, your version has to make 26 million steps, mine 2 million and 26.

Mikhail Gerasimov · Answer 2 · 2015-11-29T23:00:08.283

2

list1=["A","B"]
list2=["Apple","Banana","Balloon","Boxer","Crayons","Elephant"]

res = [l1 + ':' + ','.join(l2 for l2 in list2 if l2.startswith(l1)) for l1 in list1]
print(res)

# ['A:Apple', 'B:Banana,Balloon,Boxer']

But it seems to be complicated to read, so I would advice to use nested loops. You can create generator for more readability (if you think this version is more readable):

def f(list1, list2):
    for l1 in list1:
        val = ','.join(l2 for l2 in list2 if l2.startswith(l1))
        yield l1 + ':' + val

print(list(f(list1, list2)))

# ['A:Apple', 'B:Banana,Balloon,Boxer']

edited Nov 29 '15 at 23:00

answered Nov 29 '15 at 22:18

Mikhail Gerasimov

36,989
16
116
159

1

The generator doesn't make this any more efficient; it'll be slightly *slower* in terms of execution, and because you are calling `list()` on the generator you undo any memory efficiency there may have been. – Martijn Pieters Nov 29 '15 at 22:31
Moreover, your version still suffers from O(NM) exponential complexity. With 26 letters and 1000 words your version takes 26000 steps, mine takes 2026. – Martijn Pieters Nov 29 '15 at 22:36
"The generator doesn't make this any more efficient", in case he need ready list - yes. In case he will iterate through values - no. But efficiency is not main point: I think this solution is more readable then others. – Mikhail Gerasimov Nov 29 '15 at 22:54
@germn, using join on a generator means a list will be created regardless so there is no advantage – Padraic Cunningham Nov 29 '15 at 22:55
@PadraicCunningham: `f` is the generator in question here (it uses `yield`). I haven't even started about the generator expression used in the `str.join()` :-) – Martijn Pieters Nov 29 '15 at 22:58
@MartijnPieters I updated answer to be more accurate. – Mikhail Gerasimov Nov 29 '15 at 23:00
@MartijnPieters, ah ok, the complexity is the bigger issue but it still amazes how many people don't realise that str.join with a generator is not actually memory efficient. – Padraic Cunningham Nov 29 '15 at 23:03
@MartijnPieters "I haven't even started about the generator expression used in the str.join()" - I understand it's not more efficient then list. But there's also no reason to use list instead of generator inside join, am I right? – Mikhail Gerasimov Nov 29 '15 at 23:10
1

@germn: `str.join()` is an exception here, using a list comprehension is faster than a generator expression: [list comprehension without \[ \], Python](http://stackoverflow.com/a/9061024) – Martijn Pieters Nov 29 '15 at 23:12

Only using items from one list once in nested list comprehension

2 Answers2