3

Given

userplays = { "Alice"   : { "AC/DC" : 2,
                            "The Raconteurs" : 3,
                            "Mogwai" : 1
                          },
              "Bob"     : { "The XX" : 4,
                            "Lady Gaga" : 3,
                            "Mogwai" : 1,
                            "The Raconteurs" : 1
                          },
              "Charlie" : { "AC/DC" : 7,
                            "Lady Gaga" : 7
                          }
            }

get a list of all bands:

['Lady Gaga', 'Mogwai', 'AC/DC', 'The Raconteurs', 'The XX']

I can do

list(set(flatten([ [ band 
                     for band 
                     in playcounts.keys() ] 
                   for playcounts 
                   in userplays.values() ] ) ) )

where flatten is from Flatten (an irregular) list of lists, but is it possible without flatten, using only list/dict comprehensions?

Community
  • 1
  • 1
Felix Dombek
  • 13,664
  • 17
  • 79
  • 131

2 Answers2

8

This will do it:

set(b for v in userplays.values() for b in v.keys())

produces:

set(['Lady Gaga', 'Mogwai', 'AC/DC', 'The Raconteurs', 'The XX'])
Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
5

Another way is to use a dict comprehension (Python 2.7+):

{k:v for v in userplays.values() for k in v.keys()}.keys()

Produces:

['Lady Gaga', 'Mogwai', 'AC/DC', 'The Raconteurs', 'The XX']

At least in Python 3.3, this is also faster:

import timeit

userplays = { "Alice"   : { "AC/DC" : 2,
                            "The Raconteurs" : 3,
                            "Mogwai" : 1
                          },
              "Bob"     : { "The XX" : 4,
                            "Lady Gaga" : 3,
                            "Mogwai" : 1,
                            "The Raconteurs" : 1
                          },
              "Charlie" : { "AC/DC" : 7,
                            "Lady Gaga" : 7
                          }
            }


def f1():
    set(b for v in userplays.values() for b in v.keys())

def f2():
    {k:v for v in userplays.values() for k in v.keys()}.keys()    

t1=timeit.Timer(f1).timeit(10000)
t2=timeit.Timer(f2).timeit(10000)
faster=abs(t1-t2) / max(t1,t2)
print("""
set:                {:.4} seconds
dict:               {:.4} seconds
faster of those is  {:.4%} faster

""".format(t1,t2,faster))

Output:

set:                0.02448 seconds
dict:               0.01988 seconds
faster of those is  18.7907% faster

Edit

Just for pure curiosity, I compared the various way that this can be done in a one-liner.

Here are the results:

f1: set from a generator expression
f2: keys from a dict comprehension
f3: set comprehension
f4: set from a list comprehension

       rate/s      f4       f1       f2       f3 
f4    358,650    0.0%   -13.4%   -31.7%   -41.3% 
f1    414,246   15.5%     0.0%   -21.1%   -32.2% 
f2    525,230   46.4%    26.8%     0.0%   -14.1% 
f3    611,158   70.4%    47.5%    16.4%     0.0% 

You can see that the set comprehension is fastest, followed by a dict comprehension.

Here is the code that generated that Perl style benchmark:

import timeit
import locale
locale.setlocale(locale.LC_NUMERIC, "")

userplays = { "Alice"   : { "AC/DC" : 2,
                            "The Raconteurs" : 3,
                            "Mogwai" : 1
                          },
              "Bob"     : { "The XX" : 4,
                            "Lady Gaga" : 3,
                            "Mogwai" : 1,
                            "The Raconteurs" : 1
                          },
              "Charlie" : { "AC/DC" : 7,
                            "Lady Gaga" : 7
                          }
            }

def f1():
    """set from a generator expression"""
    set(b for v in userplays.values() for b in v.keys())

def f2():
    """keys from a dict comprehension"""
    {k:v for v in userplays.values() for k in v.keys()}.keys()    

def f3():
    """set comprehension"""
    {b for v in userplays.values() for b in v.keys()}

def f4():
    """set from a list comprehension"""
    set([b for v in userplays.values() for b in v.keys()])

def test_table(funcs, c):
    results={k.__name__:timeit.Timer(k).timeit(c) for k in funcs}
    fastest=sorted(results,key=results.get, reverse=True)
    table=[]
    table.append([' ','rate/s']+fastest)
    for e in fastest:
        temp=[]
        temp.append(e)
        temp.append(int(round(float(c)/results[e])))
        t2=['{:.1%}'.format((results[x]-results[e])/results[e]) for x in fastest]
        table.append(temp+t2)
    print()    
    for e in funcs:
        print('{}: {}'.format(e.__name__, e.__doc__))
    print()            
    pprint_table(table)    

def format_num(num):
    """Format a number according to given places.
    Adds commas, etc. Will truncate floats into ints!"""

    try:
        inum = int(num)
        return locale.format("%.*f", (0, inum), True)

    except (ValueError, TypeError):
        return str(num)

def get_max_width(table, index):
    """Get the maximum width of the given column index"""
    return max([len(format_num(row[index])) for row in table])        

def pprint_table(table):
    col_paddings = []
    for i in range(len(table[0])):
        col_paddings.append(get_max_width(table, i))

    for row in table:
        # left col
        print(row[0].ljust(col_paddings[0] + 1),end=' ')
        # rest of the cols
        for i in range(1, len(row)):
            col = format_num(row[i]).rjust(col_paddings[i] + 2)
            print (col,end=' ')
        print()

test_table([f1,f2,f3,f4],100000) 
the wolf
  • 34,510
  • 13
  • 53
  • 71
  • But why generate a key-value mapping if you're just going to ignore the values and treat it as a set of keys? – Ben Jun 23 '12 at 01:36
  • The idea is the same as Ned's, but I'm not sure if it necessarily has the same runtime behaviour: Ned's might create the whole list, then filter it with `set()`, while this solution adds the items to the dict one by one, so they might get filtered immediately without a large intermediate list containing duplicates. – Felix Dombek Jun 23 '12 at 02:04
  • @Ben: Because to some (like me) dict comprehensions are beautiful and in this case, also faster. – the wolf Jun 23 '12 at 02:36
  • 1
    @FelixDombek No, Ned's was a generator expression (not a list comprehension) which is then passed to `set`, so it won't build the whole list. It's probably the generator expression machinery that is slower than a dict comprehension. – Ben Jun 23 '12 at 03:02
  • 1
    @carrot-top Fair enough. To my mind, dict comprehsions are a fantastic addition to the language, and a beautiful way to build dictionaries. But to my mind building a dictionary and ignoring its values is an ugly way to build a set. If it happens to be faster, then there is a reason to do it though (if this is a performance bottleneck and the program is not sufficiently fast). – Ben Jun 23 '12 at 03:04
  • 1
    @FelixDombek: You could also use a set comprehension: `{b for v in userplays.values() for b in v.keys()}` this may even be faster if, as Ben (and I) suspect that the generator machinery is slowing Ned's down. – the wolf Jun 23 '12 at 03:10
  • That is probably the best one so far. Thanks. – Felix Dombek Jun 23 '12 at 13:37