16

I have 2 dictionaries, A and B. A has 700000 key-value pairs and B has 560000 key-values pairs. All key-value pairs from B are present in A, but some keys in A are duplicates with different values and some have duplicated values but unique keys. I would like to subtract B from A, so I can get the remaining 140000 key-value pairs. When I subtract key-value pairs based on key identity, I remove lets say 150000 key-value pairs because of the repeated keys. I want to subtract key-value pairs based on the identity of BOTH key AND value for each key-value pair, so I get 140000. Any suggestion would be welcome.

This is an example:

A = {'10':1, '11':1, '12':1, '10':2, '11':2, '11':3}
B = {'11':1, '11':2}

I DO want to get: A-B = {'10':1, '12':1, '10':2, '11':3}

I DO NOT want to get:

a) When based on keys:

{'10':1, '12':1, '10':2}

or

b) When based on values:

{'11':3}
Lucas
  • 1,139
  • 3
  • 11
  • 23
  • 1
    Possible duplicate of [How to remove a key from a dictionary?](http://stackoverflow.com/questions/11277432/how-to-remove-a-key-from-a-dictionary) – Code-Apprentice Feb 03 '16 at 20:35
  • No @Code-Apprendice, that post does not answer my question. I don't want to remove keys from a dict, but to subtract key-value pairs. – Lucas Feb 03 '16 at 20:38
  • @Lucas: Isn't that just semantics? Removing the key removes the value. – Steven Rumbalski Feb 03 '16 at 20:38
  • @Lucas try `difference` in set. – xrisk Feb 03 '16 at 20:40
  • @Lucas How is removing a key different than subtracting key-value pairs? What do you mean by "subtract key-value"? Apparently your question is not entirely clear. Please add more details so that we can understand what you want to do. – Code-Apprentice Feb 03 '16 at 20:40
  • Hi @Steven Rumbalski, the problem is that some keys are duplicates but with different values, so when I remove the keys in the way you say, I will remove key-value pairs that have same keys but different values. I don't want that. – Lucas Feb 03 '16 at 20:42
  • @viakondratiuk That isn't quite what is asked for. In your link, what is wanted is to find the difference between the values for each key. Here Lucas wants to remove every duplicate key. – zondo Feb 03 '16 at 20:43
  • @Lucas: Are your values integers? If so, your collections can be `collections.Counter` a subclass of dict. `collections.Counter` has a `subtract` method. – Steven Rumbalski Feb 03 '16 at 20:44
  • 2
    @Lucas: Your question would be well served with a small example of what you are asking for. – Steven Rumbalski Feb 03 '16 at 20:47
  • I just edited the post. I hope it is clearer this time. – Lucas Feb 03 '16 at 20:52
  • @Lucas: If `A = {'x':10, 'y':5, 'z':1}` and `B = {'x':10, 'y':3}` should the result be `{'y':2, 'z':1}` or `{'y':5, 'z':1}`? – Steven Rumbalski Feb 03 '16 at 20:54
  • 9
    @Lucas: how can k:v pairs from B be in A and then A also have duplicated keys with different values? A key can only appear once in a dictionary? – Will Feb 03 '16 at 21:01
  • @Steven Rumbalsky and the others. I just added an example in the edited post. Thank you and the others for your feedback. – Lucas Feb 03 '16 at 21:15
  • 7
    `A = {'10':1, '11':1, '12':1, '10':2, '11':2, '11':3}` is not possible. If you do this at the python prompt, you will get something like `{'11': 3, '10': 2, '12': 1}` for A. – PaulMcG Feb 03 '16 at 21:19
  • @Lucas Why not accepting the answer that gave the solution? – raratiru Dec 13 '16 at 22:20

9 Answers9

44

To get items in A that are not in B, based just on key:

C = {k:v for k,v in A.items() if k not in B}

To get items in A that are not in B, based on key and value:

C = {k:v for k,v in A.items() if k not in B or v != B[k]}

To update A in place (as in A -= B) do:

from collections import deque
consume = deque(maxlen=0).extend
consume(A.pop(key, None) for key in B)

(Unlike using map() with A.pop, calling A.pop with a None default will not break if a key from B is not present in A. Also, unlike using all, this iterator consumer will iterate over all values, regardless of truthiness of the popped values.)

PaulMcG
  • 62,419
  • 16
  • 94
  • 130
  • 2
    This is the most logical/readable, and probably fastest, and it easily tweakable whether the values have to be equal (or just keys being equal) as well. – PascalVKooten Feb 03 '16 at 20:52
18

An easy, intuitive way to do this is

dict(set(a.items()) - set(b.items()))
Blair
  • 6,623
  • 1
  • 36
  • 42
11
A = {'10':1, '11':1, '12':1, '10':2, '11':2, '11':3}
B = {'11':1, '11':2}

You can't have duplicate keys in Python. If you run the above, it will get reduced to:

A={'11': 3, '10': 2, '12': 1}
B={'11': 2}

But to answer you question, to do A - B (based on dict keys):

all(map( A.pop, B))   # use all() so it works for Python 2 and 3.
print A # {'10': 2, '12': 1}
  • 1
    At least in Python 3, `map` does not seem to work as described by Monty After running `map( A.pop, B )`, A is unchanged. (Perhaps because in Python 3, `map` returns an iterator.) – mpb Jan 11 '18 at 02:01
  • 2
    @mpb, good catch! have to put it inside all() or something so to consume the iterator. Works for Python 2 and 3 – Monty Montemayor Jan 24 '18 at 22:17
  • 1
    This did not work for me since all() returns a bool. Did I miss something? – Joshua Stafford Feb 28 '18 at 17:45
  • what version of python do you have? btw, why downvote, if I can still help you? – Monty Montemayor Feb 28 '18 at 19:29
  • Sorry this is months after the fact, but I downvoted because "it did not work". If I made the mistake, I'm happy to turn that frown upside down. I'm using Python 3. – Joshua Stafford Jun 30 '18 at 14:38
  • 7
    I'd avoid using `map` just for its side effect, and avoid using `all` to force evaluation. If any of your values are falsey `all` will stop popping values from A prematurely! In general there's nothing wrong with an imperative for-loop, and in this case PaulMcG's non-mutative comprehension answer seems like the best solution. – Orez Jul 09 '18 at 20:01
  • If you need to consume an iterator use [`collection.deque()`](https://stackoverflow.com/a/21210673) not `all()` since `all` will stop if one of the keys in B is falsy. – Chen Levy Jan 04 '21 at 19:50
4

dict-views:

Keys views are set-like since their entries are unique and hashable. If all values are hashable, so that (key, value) pairs are unique and hashable, then the items view is also set-like. (Values views are not treated as set-like since the entries are generally not unique.) For set-like views, all of the operations defined for the abstract base class collections.abc.Set are available (for example, ==, <, or ^).

So you can:

>>> A = {'10':1, '11':1, '12':1, '10':2, '11':2, '11':3}
>>> B = {'11':1, '11':2}
>>> A.items() - B.items()
{('11', 3), ('12', 1), ('10', 2)}
>>> dict(A.items() - B.items())
{'11': 3, '12': 1, '10': 2}

For python 2 use dict.viewitems.

P.S. You can't have duplicate keys in dict.

>>> A = {'10':1, '11':1, '12':1, '10':2, '11':2, '11':3}
>>> A
{'10': 2, '11': 3, '12': 1}
>>> B = {'11':1, '11':2}
>>> B
{'11': 2}
Be3y4uu_K0T
  • 318
  • 4
  • 10
3

Another way of using the efficiency of sets. This might be more multipurpose than the answer by @brien. His answer is very nice and concise, so I upvoted it.

diffKeys = set(a.keys()) - set(b.keys())
c = dict()
for key in diffKeys:
  c[key] = a.get(key)

EDIT: There is the assumption here, based on the OP's question, that dict B is a subset of dict A, that the key/val pairs in B are in A. The above code will have unexpected results if you are not working strictly with a key/val subset. Thanks to Steven for pointing this out in his comment.

zondo
  • 19,901
  • 8
  • 44
  • 83
robert arles
  • 153
  • 2
  • 9
  • This is different than @brien's answer. This considers keys only whereas the other answer considers key-value pairs. They will give different answers. – Steven Rumbalski Feb 03 '16 at 21:04
  • @StevenRumbalski: Yes! True. I should have pointed that out, and will clarify it in my answer. I was working from the OPs stated presumption that all of the existing key/val pairs from b are in a. So B is a subset. – robert arles Feb 03 '16 at 21:31
2

Since I can not (yet) comment: the accepted answer will fail if there are some keys in B not present in A.

Using dict.pop with a default would circumvent it (borrowed from How to remove a key from a Python dictionary?):

all(A.pop(k, None) for k in B)

or

tuple(A.pop(k, None) for k in B)
slavos1
  • 56
  • 6
1
result = A.copy()
[result.pop(key) for key in B if B[key] == A[key]]
zondo
  • 19,901
  • 8
  • 44
  • 83
-1

Based on only keys assuming A is a superset of B or B is a subset of A:

Python 3: c = {k:a[k] for k in a.keys() - b.keys()}

Python 2: c = {k:a[k] for k in list(set(a.keys())-set(b.keys()))}

Based on keys and can be used to update a in place as well @PaulMcG answer

-1

For subtracting the dictionaries, you could do :

A.subtract(B)

Note: This will give you negative values in a situation where B has keys that A does not.