2

I'm trying to build a set from the values of a dictionary. Each dictionary value is a list of strings.

{'a': ['a','b','c'],'b':['a','b','d'],...}

I am trying to use .update(x) to concatenate a set containing values from the dictionary. I already have success with a standard for-loop:

ingredientSet = set()
for values in recipes.values():
    ingredientSet.update(values)

What I would like to do, if possible, is to do this in a set comprehension. So far I have this:

ingredientSet = { ingredientSet.update(x) for x in recipes.values() }

but my IDE is giving me an error that "ingredientSet" is referenced before its assignment.

Is it possible to use .update(x) in the comprehension, or is there another way to concatenate the items into the set in a comprehension?

Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
socrlax24
  • 111
  • 8

2 Answers2

4

Here's a functional way to achieve this using itertools.chain.from_iterable(...):

>>> from itertools import chain
>>> my_dict = {'a': ['a','b','c'],'b':['a','b','d']}

>>> set(chain.from_iterable(my_dict.values()))
{'a', 'b', 'c', 'd'}

Also, adding here the jonsharpe's amswer from the comment using set().union(...):

>>> set().union(*my_dict.values())
{'a', 'b', 'd', 'c'}

Performance Comparison

Below is the timeit comparison of all the answers on Python3:

  • Using itertools.chain.from_iterable - 0.558 usec per loop

    mquadri$ python3 -m timeit -s "from itertools import chain; my_dict = {'a': ['a','b','c'],'b':['a','b','d']}" "set(chain.from_iterable(my_dict.values()))"
    1000000 loops, best of 3: 0.558 usec per loop
    
  • Using set comprehension - 0.585 usec per loop

    mquadri$ python3 -m timeit -s "from itertools import chain; my_dict = {'a': ['a','b','c'],'b':['a','b','d']}" "{item for items in my_dict.values() for item in items}"
    1000000 loops, best of 3: 0.585 usec per loop
    
  • Using set().union(...) - 0.614 usec per loop

    mquadri$ python3 -m timeit -s "from itertools import chain; my_dict = {'a': ['a','b','c'],'b':['a','b','d']}" "set().union(*my_dict.values())"
    1000000 loops, best of 3: 0.614 usec per loop
    
Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
  • So these are all very close from a performance point of view. itertools can often be faster if the data set is larger. What happens with a lot more data? – Stephen Rauch Jun 04 '18 at 01:48
3

If you want a comprehension you can do that with two fors like:

Code:

values_set = {item for items in data.values() for item in items}

Test Code:

data = {'a': ['a','b','c'],'b':['a','b','d']}

values_set = {item for items in data.values() for item in items}
print(values_set)

Result:

{'d', 'b', 'c', 'a'}
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135