Flatten a list of strings to characters and then de-dupe the new list

Question

I have the following and have flattened the list via this documentation

>>> wordlist = ['cat','dog','rabbit']
>>> letterlist = [lt for wd in wordlist for lt in wd]
>>> print(letterlist)
['c', 'a', 't', 'd', 'o', 'g', 'r', 'a', 'b', 'b', 'i', 't']

Can the list comprehension be extended to remove duplicate characters. The desired result is the following (in any order):

['a', 'c', 'b', 'd', 'g', 'i', 'o', 'r', 't']

I can convert to a set and then back to a list but I'd prefer to keep it as a list.

Do you have to convert it back to a list? What functionality do you lose from having it as a set? — rlms, Oct 28 '13 at 20:47
This isn't hard to find: [eliminate dupes](http://stackoverflow.com/a/7961390), [sorting](http://stackoverflow.com/a/14032557) — ThinkChaos, Oct 28 '13 at 20:49
@sweeneyrod thanks - is it possible without changing collection type? i.e. can the comprehension be amended to answer the question — whytheq, Oct 28 '13 at 20:50
@whytheq I don't understand what you mean. Like the answers below say, you could change the square brackets to curly ones and make it a set comprehension (and therefore a set), but I presumed (from "I'd prefer to keep it as a list.") that you had some reason not to use a set. — rlms, Oct 28 '13 at 20:52
@plg fair comment - although my question is not in connection with sorting. — whytheq, Oct 28 '13 at 21:01

score 4 · Accepted Answer · answered Oct 28 '13 at 20:49

4

Easiest is to use a set comprehension instead of a list comp:

letterlist = {lt for wd in wordlist for lt in wd}

All I did was replace the square brackets with curly braces. This works in Python 2.7 and up.

For Python 2.6 and earlier, you'd use the set() callable with a generator expression instead:

letterlist = set(lt for wd in wordlist for lt in wd)

Last, but not least, you can replace the comprehension syntax altogether by producing the letters from all sequences by chaining the strings together, treat them all like one long sequence, with itertools.chain.from_iterable(); you give that a sequence of sequences and it'll give you back one long sequence:

from itertools import chain
letterlist = set(chain.from_iterable(wordlist))

answered Oct 28 '13 at 20:49

Martijn Pieters

1,048,767
296
4,058
3,343

cheers Martijn - see the last line of my OP - looks like I was already onto the best approach! – whytheq Oct 28 '13 at 20:51
what is the difference between your and my answer? – oleg Oct 28 '13 at 20:51
@oleg: Mine didn't start with just the sentence 'I think you need to use a set comprehension'. That was **not** an answer until you edited it. – Martijn Pieters Oct 28 '13 at 20:52
@oleg: I also explain what the syntax difference is, and give more options. – Martijn Pieters Oct 28 '13 at 20:53
just wanted to understand the reason for this "Gosh darn, really?". sorry. – oleg Oct 28 '13 at 20:54
@oleg That was in response to you giving an answer which was only one short sentence, which by itself was not suitable as an answer, and would have been better as a comment. Now that you have edited it, it is fine. – rlms Oct 28 '13 at 21:10
...apparently I can use a dict, within the list compehension, as a way to only save the unique items. – whytheq Oct 28 '13 at 21:29
@whytheq: you could, but that's just a set with values associated. – Martijn Pieters Oct 28 '13 at 21:30
@whytheq: or do you mean to maintain the order of the unique elements? In that case I'd use a separate set to track what has been seen and use a list comprehension. See [How do you remove duplicates from a list in Python whilst preserving order?](http://stackoverflow.com/q/480214) – Martijn Pieters Oct 28 '13 at 21:31

score 3 · Answer 2 · answered Oct 28 '13 at 20:48

3

Sets are an easy way to get unique elements from an iterable. To flatten a list of lists, itertools.chain provides a handy way to do that.

from itertools import chain

>>> set(chain.from_iterable(['cat','dog','rabbit'])
{'a', 'b', 'c', 'd', 'g', 'i', 'o', 'r', 't'}

answered Oct 28 '13 at 20:48

Jakub Roztocil

15,930
5
50
52

+1 thanks for the interesting option...I'm new to Python so suspect that itertools is well worth me checking out – whytheq Oct 28 '13 at 20:58

score 2 · Answer 3 · answered Oct 28 '13 at 20:47

2

I think set comprehension should be used

wordlist = ['cat','dog','rabbit']
letterlist = {lt for wd in wordlist for lt in wd}
print(letterlist)

this will work only in python 2.7 and higher for previous versions use set instead of {}

wordlist = ['cat','dog','rabbit']
letterlist = set(lt for wd in wordlist for lt in wd)
print(letterlist)

answered Oct 28 '13 at 20:47

oleg

4,082
16
16

@whytheq: This answer started with **just** *I think set comprehension should be used*. That was.. obvious to most visiting here, but without the later edit it wasn't an answer. – Martijn Pieters Oct 28 '13 at 20:54
@MartijnPieters ... I saw the original one-liner but oleg has filled in the gaps. +1 oleg for filling in the gaps – whytheq Oct 28 '13 at 20:56
thank You. sorry for one line answer. will try to give expanded answers next time – oleg Oct 28 '13 at 20:57

Flatten a list of strings to characters and then de-dupe the new list

3 Answers3