Dictionary merge by updating but not overwriting if value exists

Question

If I have 2 dicts as follows:

d1 = {'a': 2, 'b': 4}
d2 = {'a': 2, 'b': ''}

In order to 'merge' them:

dict(d1.items() + d2.items())

results in

{'a': 2, 'b': ''}

But what should I do if I would like to compare each value of the two dictionaries and only update d2 into d1 if values in d1 are empty/None/''?

When the same key exists, I would like to only maintain the numerical value (either from d1 or d2) instead of the empty value. If both values are empty, then no problems maintaining the empty value. If both have values, then d1-value should stay.

i.e.

d1 = {'a': 2, 'b': 8, 'c': ''}
d2 = {'a': 2, 'b': '', 'c': ''}

should result in

{'a': 2, 'b': 8, 'c': ''}

where 8 is not overwritten by ''.

**See also** (php-based) http://stackoverflow.com/questions/793464/php-array-merge-without-erasing-values — dreftymac, Jan 01 '17 at 21:24
**See also** (ruby-based) http://stackoverflow.com/questions/1980794/how-can-i-merge-two-hashes-without-overwritten-duplicate-keys-in-ruby — dreftymac, Jan 01 '17 at 21:29
**See also:** (itemgetter) https://stackoverflow.com/a/12118794/42223 — dreftymac, Feb 12 '19 at 10:03

score 58 · Accepted Answer · edited Sep 04 '18 at 05:17

58

Just switch the order:

z = dict(d2.items() + d1.items())

By the way, you may also be interested in the potentially faster update method.

In Python 3, you have to cast the view objects to lists first:

z = dict(list(d2.items()) + list(d1.items()))

If you want to special-case empty strings, you can do the following:

def mergeDictsOverwriteEmpty(d1, d2):
    res = d2.copy()
    for k,v in d2.items():
        if k not in d1 or d1[k] == '':
            res[k] = v
    return res

edited Sep 04 '18 at 05:17

Ciro Santilli OurBigBook.com

347,512
102
1,199
985

answered Jun 15 '11 at 07:33

phihag

278,196
72
453
469

i think, in this case.. if `d1` has empty item-value it would overwrite `d2` item-value which has numerical value? – siva Jun 15 '11 at 07:53
@siva Updated with your special case. – phihag Jun 15 '11 at 09:19
1

I'm thinking that should be `res=d1.copy()`, otherwise there is no information transfer between the dicts. – Richard Nov 19 '14 at 19:37
1

Python 3.4.3, at least, does not support `+` between dictionary items sets, *but* you can achieve the same results by casting to `list`: `dict(list(d2.items()) + list(d1.items()))` – JellicleCat Jul 13 '17 at 07:56
`itertools.chain()` may also help. – Frozen Flame Nov 02 '18 at 04:27
1

In Python 3, you can just do `z = {**d2, **d1}`. – Brian McCutchon Apr 24 '20 at 16:34

score 30 · Answer 2 · edited Jan 06 '22 at 19:43

30

Updates d2 with d1 key/value pairs, but only if d1 value is not None, '' (False):

>>> d1 = dict(a=1, b=None, c=2)
>>> d2 = dict(a=None, b=2, c=1)
>>> d2.update({k: v for k, v in d1.items() if v})
>>> d2
{'a': 1, 'c': 2, 'b': 2}

(Use iteritems() instead of items() in Python 2.)

edited Jan 06 '22 at 19:43

mkrieger1

19,194
5
54
65

answered Jun 15 '11 at 08:53

Mark Tolonen

166,664
26
169
251

2

... which will change the input `d2`. Why not `dr={}; dr.update(d1); dr.update((k,v) for (k,v) in d2.items() if v)` ? – Pierre GM Aug 31 '12 at 12:04
That worked for me `d2.update({k:v for k,v in d1.iteritems() if v is not None})` – Mauricio Feb 20 '19 at 22:44
I think the more suitable variation would be: `d1.update({k: v for k, v in d2.items() if not k in d1})` – roy650 Jan 26 '22 at 16:39

score 10 · Answer 3 · answered May 16 '18 at 06:56

10

To add to d2 keys/values from d1 which do not exist in d2 without overwriting any existing keys/values in d2:

temp = d2.copy()
d2.update(d1)
d2.update(temp)

answered May 16 '18 at 06:56

Ron Kalian

3,280
3
15
23

ShmulikA · Answer 4 · 2022-02-01T17:06:32.763

Python 3.5+ Literal Dict

unless using obsolete version of python you better off using this.

Pythonic & faster way for dict unpacking:

d1 = {'a':1, 'b':1}
d2 = {'a':2, 'c':2}
merged = {**d1, **d2}  # priority from right to left
print(merged)

{'a': 2, 'b': 1, 'c': 2}

its simpler and also faster than the dict(list(d2.items()) + list(d1.items())) alternative:

d1 = {i: 1 for i in range(1000000)}
d2 = {i: 2 for i in range(2000000)}

%timeit dict(list(d1.items()) + list(d2.items())) 
402 ms ± 33.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit {**d1, **d2}
144 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Merging Only Non-zero values

to do this we can just create a dict without the empty values and then merge them together this way:

d1 = {'a':1, 'b':1, 'c': '', 'd': ''}
d2 = {'a':2, 'c':2, 'd': ''}
merged_non_zero = {
    k: (d1.get(k) or d2.get(k))
    for k in set(d1) | set(d2)
}
print(merged_non_zero)

outputs:

{'a': 1, 'b': 1, 'c': 2, 'd': ''}

a -> prefer first value from d1 as 'a' exists on both d1 and d2
b -> only exists on d1
c -> non-zero on d2
d -> empty string on both

Explanation

The above code will create a dictionary using dict comprehension.

if d1 has the value and its non-zero value (i.e. bool(val) is True), it'll use d1[k] value, otherwise it'll take d2[k].

notice that we also merge all keys of the two dicts as they may not have the exact same keys using set union - set(d1) | set(d2).

This answer is flat out wrong unless you do it as `{**d2, **d1}`. You still need to reverse the dictionaries if you don't want `d2` to overwrite values in `d1`. — drhagen, Oct 12 '20 at 10:50
thanks @drhagen i've updated it to include better answer together with my suggestion on how to merge — ShmulikA, Oct 13 '20 at 15:27

score 6 · Answer 5 · answered Mar 15 '17 at 02:49

Here's an in-place solution (it modifies d2):

# assumptions: d2 is a temporary dict that can be discarded
# d1 is a dict that must be modified in place
# the modification is adding keys from d2 into d1 that do not exist in d1.

def update_non_existing_inplace(original_dict, to_add):
    to_add.update(original_dict) # to_add now holds the "final result" (O(n))
    original_dict.clear() # erase original_dict in-place (O(1))
    original_dict.update(to_add) # original_dict now holds the "final result" (O(n))
    return

Here's another in-place solution, which is less elegant but potentially more efficient, as well as leaving d2 unmodified:

# assumptions: d2 is can not be modified
# d1 is a dict that must be modified in place
# the modification is adding keys from d2 into d1 that do not exist in d1.

def update_non_existing_inplace(original_dict, to_add):
    for key in to_add.iterkeys():
        if key not in original_dict:
            original_dict[key] = to_add[key]

score 5 · Answer 6 · answered Jun 15 '11 at 08:44

5

d2.update(d1) instead of dict(d2.items() + d1.items())

answered Jun 15 '11 at 08:44

warvariuc

57,116
41
173
227

11

... would change the content of `d2` which might not be what the OP want. At least, the `dict(d1.items()+d2.items())` keeps the inputs unchanged. – Pierre GM Aug 31 '12 at 12:02

Artsiom Rudzenka · Answer 7 · 2011-06-15T11:02:38.980

4

In case when you have dictionaries with the same size and keys you can use the following code:

dict((k,v if k in d2 and d2[k] in [None, ''] else d2[k]) for k,v in d1.iteritems())

edited Jun 15 '11 at 11:02

answered Jun 15 '11 at 07:35

Artsiom Rudzenka

27,895
4
34
52

unfortunately, my dictionaries are not some size and keys, only some occurence of the same keys with diff values. – siva Jun 15 '11 at 08:42
@siva: i have modified code to check d2 on key from d1 if this is your case. – Artsiom Rudzenka Jun 15 '11 at 12:55

est.tenorio · Answer 8 · 2021-02-05T19:36:34.580

If you want to ignore empty spaces so that for example merging:

a = {"a": 1, "b": 2, "c": ""}
b = {"a": "", "b": 4, "c": 5}
c = {"a": "aaa", "b": ""}
d = {"a": "", "w": ""}

results in:{'a': 'aaa', 'b': 4, 'c': 5, 'w': ''}

You can use these 2 functions:

def merge_two_dicts(a, b, path=None):
    "merges b into a"
    if path is None:
        path = []
    for key in b:
        if key in a:
            if isinstance(a[key], dict) and isinstance(b[key], dict):
                merge_two_dicts(a[key], b[key], path + [str(key)])
            elif a[key] == b[key]:
                pass  # same leaf value
            else:
                if a[key] and not b[key]:
                    a[key] = a[key]
                else:
                    a[key] = b[key]
        else:
            a[key] = b[key]
    return a


def merge_multiple_dicts(*a):
    output = a[0]
    if len(a) >= 2:
        for n in range(len(a) - 1):
            output = merge_two_dicts(output, a[n + 1])

    return output

So you can just use merge_multiple_dicts(a,b,c,d)

joao8tunes · Answer 9 · 2022-02-09T14:24:13.350

I have a solution if you want to have more freedom to choose when a value should be overwritten in the merged dictionary. Maybe it's a verbose script, but it's not hard to understand its logic.

Thanks fabiocaccamo and senderle for sharing the benedict package, and the nested iteration logic in lists, respectively. This knowledge was fundamental to the script development.

Python Requirements

pip install python-benedict==0.24.3

Python Script

Definition of the Dict class.

from __future__ import annotations

from collections.abc import Mapping
from benedict import benedict
from typing import Iterator
from copy import deepcopy


class Dict:
    def __init__(self, data: dict = None):
        """
        Instantiates a dictionary object with nested keys-based indexing.

        Parameters
        ----------
        data: dict
            Dictionary.

        References
        ----------
        [1] 'Dict' class: https://stackoverflow.com/a/70908985/16109419
        [2] 'Benedict' package: https://github.com/fabiocaccamo/python-benedict
        [3] Dictionary nested iteration: https://stackoverflow.com/a/10756615/16109419
        """
        self.data = deepcopy(data) if data is not None else {}

    def get(self, keys: [object], **kwargs) -> (object, bool):
        """
        Get dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to get item value based on.

        Returns
        -------
        value, found: (object, bool)
            Item value, and whether the target item was found.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        value, found = None, False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Getting item value from dictionary:
            if trace == keys:
                value, found = outer_value, True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                value, found = self.get(
                    data=outer_value,
                    keys=keys,
                    path=trace
                )

        return value, found

    def set(self, keys: [object], value: object, **kwargs) -> bool:
        """
        Set dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to set item value based on.
        value: object
            Item value.

        Returns
        -------
        updated: bool
            Whether the target item was updated.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        updated = False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Setting item value on dictionary:
            if trace == keys:
                data[outer_key] = value
                updated = True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                updated = self.set(
                    data=outer_value,
                    keys=keys,
                    value=value,
                    path=trace
                )

        return updated

    def add(self, keys: [object], value: object, **kwargs) -> bool:
        """
        Add dictionary item value based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to add item based on.
        value: object
            Item value.

        Returns
        -------
        added: bool
            Whether the target item was added.
        """
        data = kwargs.get('data', self.data)
        added = False

        # Adding item on dictionary:
        if keys[0] not in data:
            if len(keys) == 1:
                data[keys[0]] = value
                added = True
            else:
                data[keys[0]] = {}

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            if outer_key == keys[0]:  # Recursion cutoff.
                if len(keys) > 1 and isinstance(outer_value, Mapping):
                    added = self.add(
                        data=outer_value,
                        keys=keys[1:],
                        value=value
                    )

        return added

    def remove(self, keys: [object], **kwargs) -> bool:
        """
        Remove dictionary item based on nested keys.

        Parameters
        ----------
        keys: [object]
            Nested keys to remove item based on.

        Returns
        -------
        removed: bool
            Whether the target item was removed.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])
        removed = False

        # Looking for item location on dictionary:
        for outer_key, outer_value in data.items():
            trace = path + [outer_key]

            # Removing item from dictionary:
            if trace == keys:
                del data[outer_key]
                removed = True
                break

            if trace == keys[:len(trace)] and isinstance(outer_value, Mapping):  # Recursion cutoff.
                removed = self.remove(
                    data=outer_value,
                    keys=keys,
                    path=trace
                )

        return removed

    def items(self, **kwargs) -> Iterator[object, object]:
        """
        Get dictionary items based on nested keys.

        Returns
        -------
        keys, value: Iterator[object, object]
            List of nested keys and list of values.
        """
        data = kwargs.get('data', self.data)
        path = kwargs.get('path', [])

        for outer_key, outer_value in data.items():
            if isinstance(outer_value, Mapping):
                for inner_key, inner_value in self.items(data=outer_value, path=path + [outer_key]):
                    yield inner_key, inner_value
            else:
                yield path + [outer_key], outer_value

    @staticmethod
    def merge(dict_list: [dict], overwrite: bool = False, concat: bool = False, default_value: object = None) -> dict:
        """
        Merges dictionaries, with value assignment based on order of occurrence. Overwrites values if and only if:
            - The key does not yet exist on merged dictionary;
            - The current value of the key on merged dictionary is the default value.

        Parameters
        ----------
        dict_list: [dict]
            List of dictionaries.
        overwrite: bool
            Overwrites occurrences of values. If false, keep the first occurrence of each value found.
        concat: bool
            Concatenates occurrences of values for the same key.
        default_value: object
            Default value used as a reference to override dictionary attributes.

        Returns
        -------
        md: dict
            Merged dictionary.
        """
        dict_list = [d for d in dict_list if d is not None and isinstance(d, dict)] if dict_list is not None else []
        assert len(dict_list), f"no dictionaries given."

        # Keeping the first occurrence of each value:
        if not overwrite:
            dict_list = [Dict(d) for d in dict_list]

            for i, d in enumerate(dict_list[:-1]):
                for keys, value in d.items():
                    if value != default_value:
                        for j, next_d in enumerate(dict_list[i+1:], start=i+1):
                            next_d.remove(keys=keys)

            dict_list = [d.data for d in dict_list]

        md = benedict()
        md.merge(*dict_list, overwrite=True, concat=concat)

        return md

Definition of the main method to show examples.

import json


def main() -> None:
    dict_list = [
        {1: 'a', 2: None, 3: {4: None, 5: {6: None}}},
        {1: None, 2: None, 3: {4: 'c', 5: {6: {7: None}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {7: 'd'}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['e', 'f']}}}}}},
        {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['g', 'h']}}}}}},
    ]

    d = Dict(data=dict_list[-1])

    print("Dictionary operations test:\n")
    print(f"data = {json.dumps(d.data, indent=4)}\n")
    print(f"d = Dict(data=data)")

    keys = [11]
    value = {12: {13: 14}}
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    print(f"d.set(keys={keys}, value={value}) --> {d.set(keys=keys, value=value)}")
    print(f"d.add(keys={keys}, value={value}) --> {d.add(keys=keys, value=value)}")
    keys = [11, 12, 13]
    value = 14
    print(f"d.add(keys={keys}, value={value}) --> {d.add(keys=keys, value=value)}")
    value = 15
    print(f"d.set(keys={keys}, value={value}) --> {d.set(keys=keys, value=value)}")
    keys = [11]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12, 13]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [11, 12, 13, 15]
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")
    keys = [2]
    print(f"d.remove(keys={keys}) --> {d.remove(keys=keys)}")
    print(f"d.remove(keys={keys}) --> {d.remove(keys=keys)}")
    print(f"d.get(keys={keys}) --> {d.get(keys=keys)}")

    print("\n-----------------------------\n")
    print("Dictionary values match test:\n")
    print(f"data = {json.dumps(d.data, indent=4)}\n")
    print(f"d = Dict(data=data)")

    for keys, value in d.items():
        real_value, found = d.get(keys=keys)
        status = "found" if found else "not found"
        print(f"d{keys} = {value} == {real_value} ({status}) --> {value == real_value}")

    print("\n-----------------------------\n")
    print("Dictionaries merge test:\n")

    for i, d in enumerate(dict_list, start=1):
        print(f"d{i} = {d}")

    dict_list_ = [f"d{i}" for i, d in enumerate(dict_list, start=1)]
    print(f"dict_list = [{', '.join(dict_list_)}]")

    md = Dict.merge(dict_list=dict_list)
    print("\nmd = Dict.merge(dict_list=dict_list)")
    print("print(md)")
    print(f"{json.dumps(md, indent=4)}")


if __name__ == '__main__':
    main()

Output

Dictionary operations test:

data = {
    "1": null,
    "2": "b",
    "3": {
        "4": null,
        "5": {
            "6": {
                "8": {
                    "9": {
                        "10": [
                            "g",
                            "h"
                        ]
                    }
                }
            }
        }
    }
}

d = Dict(data=data)
d.get(keys=[11]) --> (None, False)
d.set(keys=[11], value={12: {13: 14}}) --> False
d.add(keys=[11], value={12: {13: 14}}) --> True
d.add(keys=[11, 12, 13], value=14) --> False
d.set(keys=[11, 12, 13], value=15) --> True
d.get(keys=[11]) --> ({12: {13: 15}}, True)
d.get(keys=[11, 12]) --> ({13: 15}, True)
d.get(keys=[11, 12, 13]) --> (15, True)
d.get(keys=[11, 12, 13, 15]) --> (None, False)
d.remove(keys=[2]) --> True
d.remove(keys=[2]) --> False
d.get(keys=[2]) --> (None, False)

-----------------------------

Dictionary values match test:

data = {
    "1": null,
    "3": {
        "4": null,
        "5": {
            "6": {
                "8": {
                    "9": {
                        "10": [
                            "g",
                            "h"
                        ]
                    }
                }
            }
        }
    },
    "11": {
        "12": {
            "13": 15
        }
    }
}

d = Dict(data=data)
d[1] = None == None (found) --> True
d[3, 4] = None == None (found) --> True
d[3, 5, 6, 8, 9, 10] = ['g', 'h'] == ['g', 'h'] (found) --> True
d[11, 12, 13] = 15 == 15 (found) --> True

-----------------------------

Dictionaries merge test:

d1 = {1: 'a', 2: None, 3: {4: None, 5: {6: None}}}
d2 = {1: None, 2: None, 3: {4: 'c', 5: {6: {7: None}}}}
d3 = {1: None, 2: 'b', 3: {4: None, 5: {6: {7: 'd'}}}}
d4 = {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['e', 'f']}}}}}}
d5 = {1: None, 2: 'b', 3: {4: None, 5: {6: {8: {9: {10: ['g', 'h']}}}}}}
dict_list = [d1, d2, d3, d4, d5]

md = Dict.merge(dict_list=dict_list)
print(md)
{
    "1": "a",
    "2": "b",
    "3": {
        "4": "c",
        "5": {
            "6": {
                "7": "d",
                "8": {
                    "9": {
                        "10": [
                            "e",
                            "f"
                        ]
                    }
                }
            }
        }
    }
}