0

How to combine dictionaries from multiple lists if they share a common key-value pair?

For example, here are three lists of dictionaries:

l1 = [{'fruit':'banana','category':'B'},{'fruit':'apple','category':'A'}]
l2 = [{'type':'new','category':'A'},{'type':'old','category':'B'}]
l3 = [{'order':'2','type':'old'},{'order':'1','type':'new'}]

Desired Result:

l = [{'fruit':'apple','category':'A','order':'1','type':'new'},{'fruit':'banana','category':'B','order':'2','type':'old'}]

The tricky part is that I want this function to only take in the lists as arguments and not the keys, because I want to only have to plug in any number of list of dictionaries and not be concerned by which key-names are the overlapping ones (in this case they key-names that bring all three together are 'category' and 'type').

I should note index shouldn't matter, as it only should be based on common elements.

Here's my attempt:

def combine_lists(*args):
    base_list = args[0]
    L = []
    for sublist in args[1:]:
        L.extend(sublist)
    for D in base_list:
        for Dict in L:
            if any([tup in Dict.items() for tup in D.items()]): 
                D.update(Dict)
    return base_list
Chris
  • 5,444
  • 16
  • 63
  • 119
  • there are no common pairs between `l1` and `l3`. All dictionaries in a single list have the same keys. Is it guaranteed? – jfs Jun 23 '15 at 23:52
  • Yep, that's intentional, since a dictionary in l1 should be able to match with a dictionary in l3 through a dictionary in l2 (e.g. {'fruit':'banana','category':'B'} merges with {'order':'2','type':'old'} as {'type':'old','category':'B'} connects them). Assume that all dictionaries within a single list have the same keys. – Chris Jun 23 '15 at 23:55
  • I'd recommend critically examining your algorithms and data structures so that you don't have to deal with oddities like this. – TigerhawkT3 Jun 23 '15 at 23:56
  • It reminds me of [union-find and connected components in a graph -based algorithms](http://stackoverflow.com/q/15331877/4279). Though I haven't considered whether they are suitable in this case. See also, [Replace list of list with “condensed” list of list while maintaining order](http://stackoverflow.com/q/13714755/4279) – jfs Jun 24 '15 at 00:04
  • I think how you are approaching this is not very good, you can get the output you want using dict.viewitems but there would be a whole lot of other work also – Padraic Cunningham Jun 24 '15 at 00:04
  • Right - I don't like the way I'm approaching it either, so hoping to get a better method – Chris Jun 24 '15 at 00:10
  • 1
    ..and specifically [`condense_sets()` function](http://stackoverflow.com/a/13715626/4279). Note: `dict.view*()` methods return objects that support some set operations. – jfs Jun 24 '15 at 00:12
  • If you do get an accurate answer to this question, I can almost guarantee that it will be ugly. Go back and rethink the way you generated these `list`s. – TigerhawkT3 Jun 24 '15 at 00:18

1 Answers1

1

For this problem it is convenient to regard the dicts as lists of tuples:

In [4]: {'fruit':'apple','category':'A'}.items()
Out[4]: [('category', 'A'), ('fruit', 'apple')]

Since we wish to connect dicts which share a key-value pair, we can regard each tuple as a node in a graph and pairs of tuples as edges. Once you have a graph the problem is reduced to finding the graph's connected components.

Using networkx,

import itertools as IT
import networkx as nx

l1 = [{'fruit':'apple','category':'A'},{'fruit':'banana','category':'B'}]
l2 = [{'type':'new','category':'A'},{'type':'old','category':'B'}]
l3 = [{'order':'1','type':'new'},{'order':'2','type':'old'}]

data = [l1, l2, l3]
G = nx.Graph()
for dct in IT.chain.from_iterable(data):
    items = list(dct.items())
    node1 = node1[0]
    for node2 in items:
        G.add_edge(node1, node22)

for cc in nx.connected_component_subgraphs(G):
    print(dict(IT.chain.from_iterable(cc.edges())))

yields

{'category': 'A', 'fruit': 'apple', 'type': 'new', 'order': '1'}
{'category': 'B', 'fruit': 'banana', 'type': 'old', 'order': '2'}

If you wish to remove the networkx dependency, you could use, for example, pillmuncher's implementation:

import itertools as IT

def connected_components(neighbors):
    """
    https://stackoverflow.com/a/13837045/190597 (pillmuncher)
    """
    seen = set()
    def component(node):
        nodes = set([node])
        while nodes:
            node = nodes.pop()
            seen.add(node)
            nodes |= neighbors[node] - seen
            yield node
    for node in neighbors:
        if node not in seen:
            yield component(node)

l1 = [{'fruit':'apple','category':'A'},{'fruit':'banana','category':'B'}]
l2 = [{'type':'new','category':'A'},{'type':'old','category':'B'}]
l3 = [{'order':'1','type':'new'},{'order':'2','type':'old'}]

data = [l1, l2, l3]
G = {}
for dct in IT.chain.from_iterable(data):
    items = dct.items()
    node1 = items[0]
    for node2 in items[1:]:
        G.setdefault(node1, set()).add(node2)
        G.setdefault(node2, set()).add(node1)

for cc in connected_components(G):
    print(dict(cc))

which prints the same result as above.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677