2

I've searched and found this Append a dictionary to a dictionary but that clobbers keys from b if they exist in a..

I'd like to essentially recursively append 1 dictionary to another, where:

  • keys are unique (obviously, it's a dictionary), but each dictionary is fully represented in the result such that a.keys() and b.keys() are both subsets of c.keys()
  • if the same key is in both dictionaries, the resulting key contains a list of values from both, such that a[key] and b[key] are in c[key]
  • the values could be another dictionary, (but nothing deeper than 1 level), in which case the same logic should apply (append values) such that a[key1][key2] and b[key1][key2] are in c[key][key2]

The basic example is where 2 dictionary have keys that don't overlap, and I can accomplish that in multiple ways.. c = {**a, **b} for example, so I haven't covered that below

A trickier case:

a = {
   "key1": "value_a1"
   "key2": "value_a2"
}

b = { 
   "key1": "value_b1"
   "key3": "value_b3"
}

c = combine(a, b)

c >> {
   "key1": ["value_a1", "value_b1"],
   "key2": "value_a2",
   "key3": "value_b3"
}

An even trickier case

a = {
   "key1": {
      "sub_key_1": ["sub_value_a1", "sub_value_a2"],
      "sub_key_2": "sub_value_a3"
   },
   "key2": "value_a2"
}

b = { 
   "key1": {
      "sub_key_1": ["sub_value_a1", "sub_value_b1"],
      "sub_key_2": "sub_value_b3"
   },
   "key3": "value_b3"  # I'm okay with converting this to a list even if it's not one
}

c = combine(a, b)

c >> {
   "key1": {
      "sub_key_1": ["sub_value_a1", "sub_value_a2", "sub_value_b1"],  #sub_value_a1 is not duplicated
      "sub_key_2": ["sub_value_a3", "sub_value_b3"]
   },
   "key2": "value_a2",
   "key3": "value_b3" # ["value_b3"] this would be okay, following from the code comment above
}

Caveats:

  • Python 3.6
  • The examples show lists being created as_needed, but I'm okay with every non-dict value being a list, as mentioned in the code comments
  • The values within the lists will always be strings

I tried to explain as best I could but can elaborate more if needed. Been working on this for a few days and keep getting stuck on the sub key part

Mitch Wilson
  • 109
  • 10

3 Answers3

1

There is no simple built-in way of doing this, but you can recreate the logic in python.

def combine_lists(a: list, b: list) -> list:
    return a + [i for i in b if i not in a]

def combine_strs(a: str, b: str) -> str:
    if a == b:
        return a
    return [a, b]

class EMPTY:
    "A sentinel representing an empty value."

def combine_dicts(a: dict, b: dict) -> dict:
    output = {}
    keys = list(a) + [k for k in b if k not in a]
    for key in keys:
        aval = a.get(key, EMPTY)
        bval = b.get(key, EMPTY)
        if isinstance(aval, list) and isinstance(bval, list):
            output[key] = combine_lists(aval, bval)
        elif isinstance(aval, str) and isinstance(bval, str):
            output[key] = combine_strs(aval, bval)
        elif isinstance(aval, dict) and isinstance(bval, dict):
            output[key] = combine_dicts(aval, bval)
        elif bval is EMPTY:
            output[key] = aval
        elif aval is EMPTY:
            output[key] = bval
        else:
            raise RuntimeError(
                f"Cannot combine types: {type(aval)} and {type(bval)}"
            )
    return output
damon
  • 14,485
  • 14
  • 56
  • 75
0

Sounds like you want a specialised version of dict. So, you could subclass it to give you the behaviour you want. Being a bit of a Python noob, I started with the answer here : Subclassing Python dictionary to override __setitem__

Then I added the behaviour in your couple of examples.

I also added a MultiValue class which is a subclass of list. This makes it easy to tell if a value in the dict already has multiple values. Also it removes duplicates, as it looks like you don't want them.

class MultiValue(list):
    # Class to hold multiple values for a dictionary key. Prevents duplicates.

    def append(self, value):
        if isinstance(value, MultiValue):
            for v in value:
                if not v in self:
                    super(MultiValue, self).append(v)
        else:
            super(MultiValue, self).append(value)


class MultiValueDict(dict):
    # dict which converts a key's value to a MultiValue when the key already exists.
    
    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)

    def __setitem__(self, key, value):
        # optional processing here
        if key in self:
            existing_value = self[key]
            if isinstance(existing_value, MultiValueDict) and isinstance(value, dict):
                existing_value.update(value)
                return
            if isinstance(existing_value, MultiValue):
                existing_value.append(value)
                value = existing_value
            else:
                value = MultiValue([existing_value, value])
        super(MultiValueDict, self).__setitem__(key, value)

    def update(self, *args, **kwargs):
        if args:
            if len(args) > 1:
                raise TypeError("update expected at most 1 arguments, "
                                "got %d" % len(args))
            other = dict(args[0])
            for key in other:
                self[key] = other[key]
        for key in kwargs:
            self[key] = kwargs[key]

    def setdefault(self, key, value=None):
        if key not in self:
            self[key] = value
        return self[key]

Example 1:

a = {
    "key1": "value_a1",
    "key2": "value_a2"
}

b = {
    "key1": "value_b1",
    "key3": "value_b3"
}

# combine by creating a MultiValueDict then using update to add b to it.
c = MultiValueDict(a)
c.update(b)

print(c)
# gives {'key1': ['value_a1', 'value_b1'], 'key2': 'value_a2', 'key3': 'value_b3'}

Example 2: The value for key1 is created as a MultiValueDict and the value for the sub_key_1 is a MultiValue, so this may not fit what you're trying to do. It depends how you're building you data set.

a = {
    "key1": MultiValueDict({
        "sub_key_1": MultiValue(["sub_value_a1", "sub_value_a2"]),
        "sub_key_2": "sub_value_a3"
    }),
    "key2": "value_a2"
}

b = {
    "key1": MultiValueDict({
        "sub_key_1": MultiValue(["sub_value_a1", "sub_value_b1"]),
        "sub_key_2": "sub_value_b3"
    }),
    "key3": "value_b3"  # I'm okay with converting this to a list even if it's not one
}

c = MultiValueDict(a)
c.update(b)

print(c)
# gives {'key1': {'sub_key_1': ['sub_value_a1', 'sub_value_a2', 'sub_value_b1'], 'sub_key_2': ['sub_value_a3', 'sub_value_b3']}, 'key2': 'value_a2', 'key3': 'value_b3'}
pcoates
  • 2,102
  • 1
  • 9
  • 20
0
a = {
   "key1": "value_a1",
   "key2": "value_a2"
}
b = { 
   "key1": "value_b1",
   "key3": "value_b3"
}
def appendValues(ax,cx):
    if type(ax)==list:#is key's value in a, a list?
        cx.extend(ax)#if it is a list then extend
    else:#key's value in a, os not a list
        cx.append(ax)#so use append
    cx=list(set(cx))#make values unique with set
    return cx
    
def combine(a,b):
    c={}
    for x in b:#first copy b keys and values to c
        c[x]=b[x]
    for x in a:#now combine a with c
        if not x in c:#this key is not in c
            c[x]=a[x]#so add it
        else:#key exists in c
            if type(c[x])==list:#is key's value in c ,a list?
                c[x]=appendValues(a[x],c[x])
            elif type(c[x])==dict:#is key's value in c a dictionary?
                c[x]=combine(c[x],a[x])#combine dictionaries
            else:#so key';'s value  is not list or dict
                c[x]=[c[x]]#make value a list
                c[x]=appendValues(a[x],c[x])
    return c
c = combine(a, b)
print(c)
print("==========================")
a = {
   "key1": {
      "sub_key_1": ["sub_value_a1", "sub_value_a2"],
      "sub_key_2": "sub_value_a3"
   },
   "key2": "value_a2"
}

b = { 
   "key1": {
      "sub_key_1": ["sub_value_a1", "sub_value_b1"],
      "sub_key_2": "sub_value_b3"
   },
   "key3": "value_b3"  # I'm okay with converting this to a list even if it's not one
}

c = combine(a, b)
print(c)
virxen
  • 408
  • 4
  • 12