34

I am trying to write a list comprehension statement that will only add an item if it's not currently contained in the list. Is there a way to check the current items in the list that is currently being constructed? Here is a brief example:

Input

{
    "Stefan" : ["running", "engineering", "dancing"],
    "Bob" : ["dancing", "art", "theatre"],
    "Julia" : ["running", "music", "art"]
}

Output

["running", "engineering", "dancing", "art", "theatre", "music"]

Code without using a list comprehension

output = []
for name, hobbies in input.items():
    for hobby in hobbies:
        if hobby not in output:
            output.append(hobby)

My Attempt

[hobby for name, hobbies in input.items() for hobby in hobbies if hobby not in ???]
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Stefan Bossbaly
  • 6,682
  • 9
  • 53
  • 82
  • 9
    Consider using a set instead, as it eliminates duplicates for you automatically. – Kevin May 19 '15 at 17:08
  • I could put the output of the list comprehension in the set() constructor but I am wondering if there is a way to do this only using list comprehension – Stefan Bossbaly May 19 '15 at 17:11
  • You could abuse `yield from` in python3.3+ to avoid the double loop: `set([(yield from x) for x in d.values()])`. Note that doing `s=set((yield from x) for x in d.values())` yields *almost* the desired output... it contains an extra `None` that you have to remove so `s.discard(None)`. (I believe there's already a question about this difference in behaviour between gen exps and list-comprehensions). – Bakuriu May 20 '15 at 07:00

8 Answers8

45

You can use set and set comprehension:

{hobby for name, hobbies in input.items() for hobby in hobbies}

As m.wasowski mentioned, we don't use the name here, so we can use item.values() instead:

{hobby for hobbies in input.values() for hobby in hobbies}

If you really need a list as the result, you can do this (but notice that usually you can work with sets without any problem):

list({hobby for hobbies in input.values() for hobby in hobbies})
Community
  • 1
  • 1
geckon
  • 8,316
  • 4
  • 35
  • 59
  • 2
    I suppose there isn't even any reason to convert it back to a list. Most probably the list would be a set in concept anyway. – Thijs van Dien May 19 '15 at 17:19
  • 2
    @ThijsvanDien I only added that because the OP had his output in a list and he might have some special reason why he needs a list. In most cases you're right of course. – geckon May 19 '15 at 17:21
  • 3
    you can use `input.values()` if you don't care about name – m.wasowski May 19 '15 at 18:16
  • @m.wasowski Thanks, you're right. I'll add it to the answer. – geckon May 19 '15 at 18:20
  • Can someone walk through that middle comprehension so I can see how 'hobby' and 'hobbies' get defined? I'm not entirely clear on the order of execution here. Thanks. – RobertoCuba Feb 04 '16 at 06:32
  • @RobertoCuba Well.. The first `hobbies` are defined as the lists of hobbies as present in `input.values()` (i.e. a list of lists of hobbies). Then these lists are iterated by the `for hobby in hobbies` - here `hobbies` is a usage of the `hobbies` defined by `hobbies in input.values()`. The second `hobby` is therefore defined as each hobby in each `hobbies` list (one by one). The first `hobby` is then a usage of such defined hobbies and each such hobby is added to the resulting set. Thus every hobby is in the result once (because it's a set). Is it more clear now? I hope I didn't make it worse. – geckon Feb 05 '16 at 00:41
  • @geckon, can the dual 'for' syntax in comprehension be extended to the case of some function `fun` of `hobby`? `{fun(hobby) for hobbies in input.values() for fun(hobby) in hobbies}`. Running into `SyntaxError` with this approach. And if not, is there an alternative? – alancalvitti Mar 04 '19 at 22:00
  • @alancalvitti `{fun(hobby) for hobbies in input.values() for hobby in hobbies}` should work. – geckon Mar 06 '19 at 10:28
18

As this answer suggests: you can use a uniqueness filter:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

and call with:

>>> f7(hobby for name, hobbies in input.items() for hobby in hobbies)
['running', 'engineering', 'dancing', 'art', 'theatre', 'music']

I would implement the uniqueness filter separately since a design rule says "different things should be handled by different classes/methods/components/whatever". Furthermore you can simply reuse this method if necessary.

Another advantage is - as is written at the linked answer - that the order of the items is preserved. For some applications, this might be necessary.

Community
  • 1
  • 1
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • even better if you drop the square brackets – shx2 May 19 '15 at 17:15
  • @shx2: modified, better? – Willem Van Onsem May 19 '15 at 17:16
  • 2
    Points for maintaining order. – Peter DeGlopper May 19 '15 at 17:16
  • Really the only reason to go for an approach like this would be if you care about the order in which the original items appeared. But even then I wouldn't be very pleased with a list comprehension having side effects. – Thijs van Dien May 19 '15 at 17:17
  • @ThijsvanDien: the side-effects on list comprehensions are locally since the only involved variables are defined in the method itself. – Willem Van Onsem May 19 '15 at 17:17
  • 1
    IMO this is more readable if the predicate is written as `if x not in seen and seen.add(x) is None` – Shashank May 19 '15 at 18:47
  • @Shashank: I agree, but if you follow the link to the answer, you will see there is a specific reason: by shortcutting the reference to that method, the access to the method is faster resulting in one or two milliseconds additional performance. If you call `seen.add`, it is possible (in general, not in this case), that the `seen.add` is modified between two calls. Python thus has to perform an additional lookup. – Willem Van Onsem May 19 '15 at 18:50
  • Oh I don't care about the localization of the add method, that wasn't what my comment was about, and I do localize things all the time when I'm racing for speed, but I would keep in mind that premature optimization is the root of all evil. – Shashank May 19 '15 at 18:52
  • True, but (a) python shortcuts boolean circuits and implicitly it says `not (seen.add(x) is None)` for the second part anyway and (b) one can hardly say this is *premature* optimization as this is a utility function. ;) – Willem Van Onsem May 19 '15 at 18:53
  • 2
    Fair enough sir, I upvoted your answer anyways because I think it's better than the accepted one due to maintaining order. @ThijsvanDien What's wrong with a list comprehension having side effects? It's pretty much the same as a for-loop having side effects. – Shashank May 19 '15 at 18:54
10

If you really really want a listcomp and only a list-comp, you can do

>>> s = []
>>> [s.append(j)  for i in d.values() for j in i if j not in s]
[None, None, None, None, None, None]
>>> s
['dancing', 'art', 'theatre', 'running', 'engineering', 'music']

Here, s is a result of a side effect and d is your original dictionary. The unique advantage here is that you can preserve the order unlike most other answers here.

Note: This a bad way as it exploits the list-comp and the result is a side effect. Don't do it as a practice, This answer is just to show you that you can achieve it using a list comp alone

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
  • @BhargavRao: I've made it syntactically more emphasized such that most persons will catch the warning. – Willem Van Onsem May 19 '15 at 17:27
  • even worse are those unecessary nested comprehensions – m.wasowski May 19 '15 at 17:28
  • @m.wasowski They are all required. Do provide an alternate solution. I'll be more than happy to add that. TIA for your answer – Bhargav Rao May 19 '15 at 17:30
  • `itertools.chain.from_iterable` - as I used in my answer – m.wasowski May 19 '15 at 17:31
  • @m.wasowski O c'mon bro! 1. I wanted to answer in a way which has not been answered before 2. I waited for someone to add the `itertools` answer. 3. This is pure python without any imports. Your answer is good no doubt – Bhargav Rao May 19 '15 at 17:34
  • standard library is not the 'pure python'? C'mon... using standard modules should be encouraged. And it is prefered solution to nested comprehension, let alone those single letter variable names... – m.wasowski May 19 '15 at 17:38
  • 2
    @m.wasowski True that your's is the better solution. This is an alternative only. Any future user will certainly see your answer before mine. So they'll learn an alternate methodology too. I agree that this is not a good way and I have been saying it from the second I posted it as an answer. As for the single letter variable names. I know that they are not in accordance with PEP. But it does no harm here as it is the 6th answer down the list. However if you still feel that this answer should not exist here, do mention it. I will delete it with a note to future viewers above 10k. Thanks again! – Bhargav Rao May 19 '15 at 17:44
  • 3
    @m.wasowski: I think the model of SO is to provide *one-to-many* relations from questions to answer each with (dis)advantages. As long as there are enough warnings even a worst case exponentially time solution is acceptable. The *voting* system is used to (a) filter out incorrect answers and (b) rank answers for practical usage. – Willem Van Onsem May 19 '15 at 17:44
  • 3
    Oh, my... sorry for expressing my opinion ;-) I was not going to drop banhammer on you guys anyway, really! @Bhargav Rao - of course keep your answer - and keep doing a good job answering people here. Sorry if my comments were too harsh, I didn't mean to. – m.wasowski May 19 '15 at 17:55
  • 2
    @m.wasowski You really need not be sorry here! The only point is that if this answer was the first or second or an FGITW then it certainly does not belong here. The timing of the answer is important too. A bad answer indicating as to why it is bad also helps the future users not to utilize such tactics in their problem solving. I hope with this note all stands cleared! Have a nice day. :) – Bhargav Rao May 19 '15 at 17:59
7

sets and dictionaries are your friends here:

from collections import OrderedDict
from itertools import chain # 'flattens' collection of iterables

data = {
    "Stefan" : ["running", "engineering", "dancing"],
    "Bob" : ["dancing", "art", "theatre"],
    "Julia" : ["running", "music", "art"]
}

# using set is the easiest way, but sets are unordered:
print {hobby for hobby in chain.from_iterable(data.values())}
# output:
# set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])


# or use OrderedDict if you care about ordering:
print OrderedDict(
        (hobby, None) for hobby in chain.from_iterable(data.values())
    ).keys()
# output:
# ['dancing', 'art', 'theatre', 'running', 'engineering', 'music']
m.wasowski
  • 6,329
  • 1
  • 23
  • 30
  • The OrderedDict method is the one I would choose here. – SethMMorton May 19 '15 at 20:51
  • 1
    I think your answer should clarify what you mean with "ordering": As `data` is a normal dictionary the resulting order of the call to `data.values()` is arbitrary. Because of this the order of the `keys()` in the `OrderedDict` has to be considered partially arbitrary as well. The lists are randomly ordered (in your case the order of lists were the ones of: Bob, Stefan, Julia) but their elements' orders are preserved. Because of overlaps this can lead to quite some different orders. – Jörn Hees May 29 '15 at 16:42
7

There's another way of writing this that is a bit more descriptive of what you're actually doing, and doesn't require a nested (double for) comprehension:

output = set.union(*[set(hobbies) for hobbies in input_.values()])

This becomes even nicer when you'd represent the input to be more conceptually sound, i.e. use a set for the hobbies of each person (since there shouldn't be repetitions there either):

input_ = {
    "Stefan" : {"running", "engineering", "dancing"},
    "Bob" : {"dancing", "art", "theatre"}, 
    "Julia" : {"running", "music", "art"}
}

output = set.union(*input_.values())
Thijs van Dien
  • 6,516
  • 1
  • 29
  • 48
5

A list comprehension is not well-suited for this problem. I think a set comprehension would be better, but since that was already shown in another answer, I'll show a way of solving this problem with a compact one-liner:

list(set(sum(hobbies_dict.values(), [])))

Another interesting solution using bitwise or operator which serves as a union operator for sets:

from operator import or_
from functools import reduce # Allowed, but unnecessary in Python 2.x
list(reduce(or_, map(set, hobbies_dict.values())))

Or (unintentional pun, I swear), instead of using bitwise or operator, just use set.union and pass it the unpacked set-mapping of your values. No need to import or_ and reduce! This idea is inspired by Thijs van Dien's answer.

list(set.union(*map(set, hobbies_dict.values())))
Community
  • 1
  • 1
Shashank
  • 13,713
  • 5
  • 37
  • 63
4

Use a set:

dict = {
    "Stefan" : ["running", "engineering", "dancing"],
    "Bob" : ["dancing", "art", "theatre"],
    "Julia" : ["running", "music", "art"]
}

myset = set()
for _, value in dict.items():
    for item in value:
        myset.add(item)

print(myset)
nullptr
  • 650
  • 3
  • 14
4

How about this:

set(dict['Bob']+dict['Stefan']+dict['Julia'])
>>> set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])

Or more nicely:

dict = {
    "Stefan" : ["running", "engineering", "dancing"],
    "Bob" : ["dancing", "art", "theatre"],
    "Julia" : ["running", "music", "art"]
}

list_ = []
for y in dict.keys():
    list_ = list_ + dict[y]
list_ = set(list_)
>>> list_
set(['art', 'theatre', 'dancing', 'engineering', 'running', 'music'])

you can apply the list function to list_ like list(list_) to return a list rather than a set.

Plug4
  • 3,838
  • 9
  • 51
  • 79