2

I found many examples but not one with duplicate items.

I have 2 list of dict:

d1 = [{"id": 1, "value": 2}, {"id": 1, "value": 2}]
d2 = [{"id": 1, "value": 2}, {"id": 4, "value": 4}]

What I want to achieve is:

d1-d2
[{"id": 1, "value": 2}]

d2-d1
[{"id": 4, "value": 4}]

diff(d1, d2)
[{"id": 1, "value": 2}, {"id": 4, "value": 4}]
OceanFire
  • 504
  • 4
  • 11
  • 1
    Isn't what you want the union of the two lists? It doesn't look like the difference (which is stated to be "elements in the first list but not in the second"), at least for the function "diff" – Checkmate Dec 10 '20 at 01:34
  • 1
    Do your dicts always have the same keys? You can convert them to tuples, then you can put them into sets and use all the set operations on them. – Barmar Dec 10 '20 at 01:43
  • Also see https://stackoverflow.com/questions/1151658/python-hashable-dicts – Barmar Dec 10 '20 at 01:43
  • In d1 I have an old collection of posts, in d2 I have a new one. By subtracting d1-d2 I want to get the items I need to remove (they are in list 1 but not in 2). By subtracting d2-d1 I want to get the elements I need to add (they are in d2 but not in d1). – OceanFire Dec 10 '20 at 01:46

3 Answers3

3

You can do something like this but it's not that pretty:

def sub(d1, d2):
    d2_ = d2.copy()
    res = []
    for elem in d1:
        if elem in d2_:
            d2_.remove(elem)
        else:
            res.append(elem)
    return res

Then you can use sub(d1, d2) + sub(d2, d1) to get diff(d1, d2).

ssp
  • 1,666
  • 11
  • 15
3

Assuming your values, not just the keys, in each dictionary are hashable, you can make frozensets of each dicts .items(), collect them into a collections.Counter, and perform an xor-like operation on the Counters to get the disparate counts. Then you just flatten out the Counter again, and convert the frozensets back to dicts:

from collections import Counter

def diff(l1, l2):
    c1 = Counter(frozenset(d.items()) for d in l1)
    c2 = Counter(frozenset(d.items()) for d in l2)
    c1xc2 = (c1 - c2) + (c2 - c1)  # Equivalent to c1 ^ c2 if they were sets, but we need counts, and Counter doesn't support ^
    # elements() flattens the Counter to iterable of repeated values, map(dict, ...)
    # converts frozensets back to dicts
    return list(map(dict, c1xc2.elements()))

The simpler operation of computing d1 - d2 or d2 - d1 is left as an exercise; the key in all cases is converting your list of dicts to Counters of frozensets of the dict's .items(), performing the operation you care about with -, then converting back to a list of dicts.

This should be fairly efficient (roughly O(n) on combined input sizes); improving on it would require a lot more custom code that's frankly not worth the bother.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • This is a great answer +1. Note that you could easily define a `sub` function with this approach too. – ssp Dec 10 '20 at 02:11
  • 1
    @Moosefeather: Yar. I left that as an exercise (it's just a simplification of the `diff` function after all). May as well let the OP write it to ensure they understand `diff` in the first place; if they don't, they can ask a follow-up in the comments here. – ShadowRanger Dec 10 '20 at 02:13
  • @ShadowRanger It should work also with `tuple` instead of `frozenset`. – Chiheb Nexus Dec 10 '20 at 02:15
  • 1
    @ChihebNexus: In modern Python, with insertion-ordered `dict`s, it would work *if* the `dict`s were all constructed in the same order. But if they aren't, e.g. if `d2` was defined as shown in the OP's question, but `d1` was defined as `d1 = [{"value": 2, "id": 1}, {"value": 2, "id": 1}]` (`value` defined before `id`, the reverse of the OP), `tuple` would break (due to being ordered) while `frozenset` would continue to work. I wouldn't want to *rely* on insertion order being consistent for correctness. It doesn't gain you anything; the values would still need to be hashable for `Counter`. – ShadowRanger Dec 10 '20 at 02:20
  • @ShadowRanger Good point. i forgot that dict ordering can break the counter. – Chiheb Nexus Dec 10 '20 at 02:22
0

Not elegant, but this seems to work:

def diff(d1, d2):
  found = []
  to_return = []
  for item in d1:
    if item in found:
      continue
    found += [item]
    to_return += [item] * (len(list(filter(lambda x : x == item, d1))) - len(list(filter(lambda x : x == item, d2))))
  return to_return
Checkmate
  • 1,074
  • 9
  • 16