uniqify a list of dictionaries

Question

I receive data like this in a string:

foo = """Port  Mac Address       group-addr      vlan    ver
         s2p2  0100.5e00.0004    239.0.0.4       1       1
         s2p0  0100.5e00.0005    239.0.0.8       1       1
         s2p1  0100.5e00.0004    239.0.0.4       1       1"""

I wish to format it in a table. When the data goes in the table I want a separate line for each, unless the latter 4 are the same (mac, group, vlan, ver). If this happens I want the data on one line and print both ports beside each other

Vlan      Group       Type   Version     Port List
-----------------------------------------------------------------------
1         239.0.0.4   igmp   v1          s2p1, s2p2
1         239.0.0.8   igmp   v1          s2p0

I parse the data into a list of dictionaries:

def parse_lines(lines):
  headers = lines[0].split()
  entries = []
  for r in lines[1:]:
    if not len(r): continue    # skip blank lines
    vals = r.split()
    e = dict(zip(headers,vals))
    entries.append(e)
  return entries

def print_table():
    print "%s %10s %10s %14s %15s" % ("Vlan", "Group", "Type", "Version", "Port List")
    print "---------------------------------------------------------"
    if foo is not None:
        entries = foo.replace("Mac Address", "Mac-Address")    
        entries = parse_lines(entries.split("\n"))

This leaves me with a list of dictionaries, an example of the format:

[{'group-addr': '239.0.0.4', 'vlan': '1', 'ver': '1', 'Port': 's2p1', 'Mac-Address': '0100.5e00.0004'}, {'group-addr': '239.0.0.5', 'vlan': '1', 'ver': '1', 'Port': 's2p1', 'Mac-Address': '0100.5e00.0005'}]

How should I process these to compare and store them before printing? Create a new dictionary? Compare the non-port value for equivalence for the whole previous dict and then if they are all the same test the port and add the values to the new dictionary?

if the mac is the same are they considered the same or what is the criteria? — Padraic Cunningham, Jan 15 '15 at 11:37
@PadraicCunningham If everything is the same they are considered the same, all 5 values, to show this, I just change the port value to be a list of the ports with tehse same values — Paul, Jan 15 '15 at 11:39
but the ports are different, do you mean the last four and append the port? — Padraic Cunningham, Jan 15 '15 at 11:40
@PadraicCunningham Sorry I mean what you said yeah, the ports are obviously different! — Paul, Jan 15 '15 at 12:00

Alberto Coletta · Answer 1 · 2015-01-15T12:39:59.647

If I understand correctly, you can do the following.

For each line you receive, take all values except the port, and add them as a tuple key in a dict.

('239.0.0.4','0100.5e00.0004', '1', '1') = (group-addr, Mac-Address, vlan, ver)

Of course you can choose the order you like most and preserve it.

The value associated to the tuple key, is a set of ports.

At the end you will have many key-value pairs. Put all of them in a dictionary.

Therefore the dictionary will look like this:

{(group-addr, Mac-Address, vlan, ver): set(port1, port2), ...}

To add new elements, you can do:

try:
    dict[(group-addr, Mac-Address, vlan, ver)].add(port)
except KeyError:
    dict[(group-addr, Mac-Address, vlan, ver)] = set(port)

I can't test it right now, but I hope you get the logic.

Padraic Cunningham · Accepted Answer · 2015-01-15T12:25:25.317

Use Mac Address,group-addr,vlan and ver as the key to group common elements, you should do this when you create the dict originally but this is an example using the data from your question :

foo = """Port  Mac Address       group-addr      vlan    ver
         s2p2  0100.5e00.0004    239.0.0.4       1       1
         s2p0  0100.5e00.0005    239.0.0.8       1       1
         s2p1  0100.5e00.0004    239.0.0.4       1       1"""

from collections import defaultdict
d = defaultdict(set)
lines = foo.splitlines()

for line in lines[1:]:
    prt,mc,gp,vl,vr = line.split()
    d[(mc,gp,vl,vr)].add(prt)
print(d)
defaultdict(<type 'set'>, {('0100.5e00.0004', '239.0.0.4', '1', '1'): set(['s2p2', 's2p1']), ('0100.5e00.0005', '239.0.0.8', '1', '1'): set(['s2p0'])})


print "%s %10s  %14s %15s" % ("Vlan", "Group", "Version", "Port List")
print "---------------------------------------------------------"
for mc, gp, vl, vr in d:
    print("{:<10} {:<14} {:<15}".format(vl,gp,vr)) +",".join(d[mc, gp, vl, v])

Vlan      Group         Version       Port List
---------------------------------------------------------
1          239.0.0.4      1              s2p2,s2p1
1          239.0.0.8      1              s2p0

score 1 · Answer 3 · answered Jan 15 '15 at 12:10

A very naive solution :

from collections import defaultdict
def group_entries(entries):
    grouped = defaultdict(list)
    for entry in entries:
        port = entry.pop("Port")
        key = tuple(entry.items())
        grouped[key].append(port)
    results = []
    for entry, ports in grouped.items():
        entry = dict(entry)
        entry["ports"] = ", ".join(ports)
        results.append(entry)
    return results


def print_table():
    print "%s %10s %10s %14s %15s" % ("Vlan", "Group", "Type", "Version", "Port List")
    print "---------------------------------------------------------"
    if foo is not None:
        entries = foo.replace("Mac Address", "Mac-Address")    
        entries = parse_lines(entries.split("\n"))
        entries = group_entries(entries)
        # etc

but that might be quite inefficient on a larger dataset.

score 1 · Answer 4 · answered Jan 15 '15 at 12:35

1

The important thing is to make a tuple key for a dictionary using the fields you want to remain the same (mac, group, vlan and ver). Then, create a variable to hold the ports. I've chosen a list - you could use a set as others have suggested - I include that as an option when you're printing out to "uniqify" the ports. I haven't done any formatting on the output in particular - just followed your guide. I can't see where "type" in the final table comes from - but I'm sure you can adapt for that.

Also, your final table doesn't include a MAC column. If you don't need a row per MAC address, simply remove it from the dictionary key

foo = """Port  Mac Address       group-addr      vlan    ver
             s2p2  0100.5e00.0004    239.0.0.4       1       1
             s2p0  0100.5e00.0005    239.0.0.8       1       1
             s2p1  0100.5e00.0004    239.0.0.4       1       1"""

lines = foo.splitlines()
headers = lines[0]

machineDict={}
for line in lines[1:]:
    prt,mac,grp,vl,vr = line.split()
    try:
        #try to add a new port to the entry with this key
        machineDict[(mac,grp,vl,vr)].append(prt)
    except KeyError:
        #key error signals the dictionary doesn't contain that key
        # so create an entry
        machineDict[(mac,grp,vl,vr)] = [prt]

print "%s %10s %10s %14s %15s" % ("Vlan", "Group", "Type", "Version", "Port List")
print "---------------------------------------------------------"
for (mac,grp,vl,vr),portList in machineDict.items():
    print "%s %10s %10s %14s %15s" % (vl,grp,"typeVar",vr,list(set(portList)))

Note the list(set(portList)) construction simply "uniqifies" your port list for the machine. As they're likely unique anyway in your input data - you can just replace it with portList if that suits you

answered Jan 15 '15 at 12:35

J Richard Snape

20,116
5
51
79

why would you use `list(set(portList))`? – Padraic Cunningham Jan 15 '15 at 12:49
@PadraicCunningham He added that as an option to uniqify the port list - seems reasonable, what would you suggest? Just leaving out the list part? – Paul Jan 15 '15 at 13:18
2

@Paul, i would use a set in the first place instead of converting from list-set-list but my main point is if you want to print the output you need to join the elements inside the container which can be done on a set calling `list(set(portList))` is a bit pointless. Also a defaultdict is the way to go. – Padraic Cunningham Jan 15 '15 at 13:21
@PadraicCunningham Yeah thanks, I did not actually know of defaultdict until now. – Paul Jan 15 '15 at 13:24
@PadraicCunningham fair enough - you could definitely use a set in the first place (for some reason I had an issue with it adding strings like this, but probably because I'm in a rush). I wasn't aware of `defaultdict` either - it looks like it is much better - probably higher performance than my effort. Have up voted your answer - as better than mine - can see you've also edited to print as `','.join(portList)` - better soultion than printing as list – J Richard Snape Jan 15 '15 at 13:38
@PadraicCunningham Yes - I bet I was - hence it would iterate through the string adding each char, right? Oops. – J Richard Snape Jan 15 '15 at 15:31
yep, it expects an iterable, if you want to add a string the easiest way is just set.add("foo") – Padraic Cunningham Jan 15 '15 at 15:33

score 0 · Answer 5 · edited May 23 '17 at 12:11

According to Fastest way to uniqify a list in Python by Peter Bengtsson, the two top-performing methods turned out to be to convert the sequence to a set - ordered or normal.

The code there is for a rather old Python version so I won't paste it here. Instead, Does Python have an ordered set? gives an overview for the 1st option (and the 2nd one is a built-in type).

To be members of a set, your elements must be hashable. So, you need to e.g. use namedtuple's instead of dictionaries to store the records.

uniqify a list of dictionaries

5 Answers5