List comprehension with nested loops and if condition, and membership to the new list

Question

Here is my regular nested loop with if condition, and membership to the new list:

wordlist = ["micro", "macro", "stats"]
letterlist = []

for aword in wordlist:
    for aletter in aword:
        if aletter not in letterlist:  
            letterlist.append(aletter)
print(letterlist)

Which prints out the letters without duplicates: ['m', 'i', 'c', 'r', 'o', 'a', 's', 't']

When I try to do the same using list comprehension, I can only get through the nested loops:

wordlist = ["micro", "macro", "stats"]
letterlist = [aletter for aword in wordlist for aletter in aword]
print(letterlist)

This prints all letters with duplicates: ['m', 'i', 'c', 'r', 'o', 'm', 'a', 'c', 'r', 'o', 's', 't', 'a', 't', 's']

This does not work unfortunately:

wordlist = ["micro", "macro", "stats"]
letterlist = [[if aletter not in letterlist] for aword in wordlist for aletter in aword]

Question: How do I perform the perform the nestloop with if statement using list comprehension based on my above example?

Thanks in advance

score 4 · Answer 1 · answered Oct 03 '19 at 06:44

4

You can use functions dict.fromkeys() and chain.from_iterable():

from itertools import chain

list(dict.fromkeys(chain.from_iterable(wordlist)))
# ['m', 'i', 'c', 'r', 'o', 'a', 's', 't']

In Python 3.6 and below you need to replace dict with OrderedDict.

answered Oct 03 '19 at 06:44

Mykola Zotko

15,583
3
71
73

This is a nice addition to the solutions because `''.join(wordlist)` requires instantiating the entire list in memory, whereas `chain.from_iterables(wordlist)` merely visits each letter and is thus much more efficient in terms of memory. Given that improvement, I now believe this is the best solution so far. – Alexander Oct 03 '19 at 07:02
@Alexander this is best one so far :) – sahasrara62 Oct 03 '19 at 07:38

Alexander · Answer 2 · 2019-10-03T07:49:37.050

No. You cannot do this using a list comprehension because you need to create a list of letters that have been seen. I believe your best course of action is to use a for loop. If you need to keep order of the letters, use both a list and a set (the list to keep order, the set to have O(1) membership test for each letter). If order doesn't matter, then just use the set comprehension, i.e. {letter for word in word_list for letter in word}

Note that it is not pythonic to use a list comprehension for its side effects (i.e. creating a secondary list of letters that have been seen). Is it Pythonic to use list comprehensions for just side effects?

word_list = ["micro", "macro", "stats"]
letter_list = []
letters_seen = set()

for word in word_list:
    for letter in word:
        if letter in letters_seen:
            continue
        letters_seen.add(letter)
        letter_list.append(letter)

>>> letter_list
['m', 'i', 'c', 'r', 'o', 'a', 's', 't']

Timings

wordlist = ["micro", "macro", "stats"] * 100_000

%%timeit
res=[]
[res.append(aletter) for aword in wordlist for aletter in aword if aletter not in res]
# 174 ms ± 8.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
letter_list = []
letters_seen = set()

for word in wordlist:
    for letter in word:
        if letter in letters_seen:
            continue
        letters_seen.add(letter)
        letter_list.append(letter)
# 71.1 ms ± 1.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(dict.fromkeys(''.join(wordlist)))
# 37.1 ms ± 1.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit list(dict.fromkeys(chain.from_iterable(wordlist)))
# 46.8 ms ± 2.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Slightly slower, but requires less memory to run.

# Baseline comparison if order is not important (i.e. use sets).
%timeit {letter for word in wordlist for letter in word}
# 88.8 ms ± 6.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

sahasrara62 · Answer 3 · 2019-10-03T07:25:10.153

2

you can do this in following way

from collections import OrderedDict

wordlist = ["micro", "macro", "stats"]    
sol = list(OrderedDict.fromkeys(''.join(wordlist)).keys())    
print(sol)

output

['m', 'i', 'c', 'r', 'o', 'a', 's', 't']

you can also use

sol =  [*OrderedDict.fromkeys(''.join(wordlist)).keys()]

using dict it can be done as

  sol = list(dict((i,1) for i in ''.join(wordlist)).keys())

Adding @alexander solution here

sol = list(dict.fromkeys(''.join(wordlist)))

edited Oct 03 '19 at 07:25

answered Oct 03 '19 at 05:59

sahasrara62

10,069
3
29
44

As the OP stated in a comment, order is important so sets won't work. – Alexander Oct 03 '19 at 05:59
@Alexander thanks for that, i hadn't gone through comment, solution updated. – sahasrara62 Oct 03 '19 at 06:08
1

And of course the same solution is possible using simply a dict vs. OrderedDict for python 3.7+ (cython 3.6+). – Alexander Oct 03 '19 at 06:11
with dict it can be done as `sol = list(dict((i,1) for i in ''.join(wordlist)).keys())` – sahasrara62 Oct 03 '19 at 06:20
cannot say if it best or not, but it is one of the good solution among other good solution present here :) – sahasrara62 Oct 03 '19 at 06:27
1

It is both efficient and short, hence why I believe it is the best. – Alexander Oct 03 '19 at 06:29
Very elegant solution. – san Oct 03 '19 at 06:44
1

Even simpler, `list(dict.fromkeys(''.join(wordlist)))` – Alexander Oct 03 '19 at 06:56

score 1 · Accepted Answer · answered Oct 03 '19 at 05:27

1

You can save the output in a separate list like:

wordlist = ["micro", "macro", "stats"]
res=[]
[res.append(aletter) for aword in wordlist for aletter in aword if aletter not in res]
print(res)

OR

list(set([aletter for aword in wordlist for aletter in aword]))

Hope this helps!

answered Oct 03 '19 at 05:27

Bharat Gera

800
1
4
13

Thanks! The second solution was something that I already solved, however I was hoping if there is a way to proceed with the first solution in one continuous list comprehension code. – Mick Oct 03 '19 at 05:32
With first solution, I think you have to create a separate list for storing res. – Bharat Gera Oct 03 '19 at 05:33
@Mick - The `list(set([...]))` option above is good - **if order in your list does not matter**. Is retaining order in your output important to you? – S3DEV Oct 03 '19 at 05:37
@S3DEV Thanks for the question. Order matters as I want to replicate the same nested loop with If statement above using list comprehension. – Mick Oct 03 '19 at 05:43
1

Don't use a list comprehension purely for its side effects; you're creating an unnecessary list that may be long and expensive. A `for` loop accomplishes the same thing and is more clear. List comprehensions should only be used to create lists. – iz_ Oct 03 '19 at 05:54
In addition, testing for membership in a list will be O(n) where n is the number of letters seen vs. O(1) for set membership. The list `res` will increase in size to accommodate each letter seen (including duplicates!). – Alexander Oct 03 '19 at 05:57

adnanmuttaleb · Answer 5 · 2019-10-03T05:43:43.763

1

You may use Set comprehension as follow:

letterlist = { aletter for aword in wordlist for aletter in aword}

Set by default does not append duplicate values. Also it a lot more compact.

I worth mentioning that the in operator has a linear time complexity when used on Lists, while for Sets it has nearly constant time complexity.

edited Oct 03 '19 at 05:43

answered Oct 03 '19 at 05:31

adnanmuttaleb

3,388
1
29
46

Is there a way to solve this using only list comprehension? – Mick Oct 03 '19 at 05:32
Sets are meant for unique values and with Set Comp it is super easy to do that. In term of performance the in operator on the Lists has linear complexity while on Sets it uses hashes so it is a lot faster, also in term of readability you do not have to write the if stat. Answering your question I do not think it is possible. – adnanmuttaleb Oct 03 '19 at 05:40
1

@adnamuttaleb thanks for the feedback. Makes more sense to use Set now, since in membership has linear complexity which is slower than set. Did not know that part. – Mick Oct 03 '19 at 05:45

score 0 · Answer 6 · answered Oct 03 '19 at 05:43

Another solution, just adding to 2 lines of code to your own code. You convert your list into a dictionary, by definition it takes unique values and into a list again (if you need it as a list)

for aword in wordlist:
   for aletter in aword:
       if aletter not in letterlist:
           letterlist.append(aletter)
       letterdict = list(dict.fromkeys(letterlist)) #list to dictionary
       letterlist = list(letterdict)

List comprehension with nested loops and if condition, and membership to the new list

6 Answers6