2

I have two lists of dictionaries and I'd like to merge them. When a dictionary is present in both lists, I'd like to add a "confidence" key to the dictionary to reflect that the dictionary was present in both lists.

List-1

lst1 = [
    {'key': 'data_collected.service_data'},
    {'key': 'gdpr.gdpr_compliance'},
    {'key': 'disclosure_of_information.purpose_of_disclosure'},
    {'key': 'opt_out.choice_of_opt_out'}
]

List-2

lst2 = [
    {'key': 'child_data_protection.parent_guardian_consent'},
    {'key': 'ccpa.ccpa_compliance'},
    {'key': 'disclosure_of_information.purpose_of_disclosure'},
    {'key': 'opt_out.choice_of_opt_out'}
]

when i run below code i am not getting proper output

res = []
for x in lst1:
    for y in lst2:
        if x["key"] == y["key"]:
            if x not in res and y not in res:
                res.append({"key": x["key"], "confidence": 1})
        else:
            if x not in res and y not in res:
                res.append(x)
                res.append(y)

print(res)

OUTPUT should like

[
    {'key': 'data_collected.service_data'},
    {'key': 'gdpr.gdpr_compliance'},
    {
        'key': 'disclosure_of_information.purpose_of_disclosure',
        'confidence': 1
    },
    {
        'key': 'opt_out.choice_of_opt_out',
        'confidence': 1
    },
    {'key': 'child_data_protection.parent_guardian_consent'},
    {'key': 'ccpa.ccpa_compliance'}
]
ogdenkev
  • 2,264
  • 1
  • 10
  • 19

5 Answers5

1

lst1.extend(i for i in (i if i not in lst1 else lst1[lst1.index(i)].update({'confidence': 1}) for i in lst2) if i is not None)

lst1 will be your result

Andy Su
  • 131
  • 7
  • Small optimization: use a generator expression instead of a list comprehension, to avoid building a temporary copy in memory. I.e., use `lst1.extend(i for i in lst2 if i not in lst1)`, without the `[` and `]`. – joanis Sep 28 '21 at 14:33
  • Bigger issue: I don't fully understand OP's intent with that `confidence` field, but you're not adding it, so I guess you must not have noticed it in the question. – joanis Sep 28 '21 at 14:35
  • @joanis thanks for your note, re-edit it, but still need a list comprehension, any idea? – Andy Su Sep 29 '21 at 10:05
  • 1
    Just turn that `[...]` into `(...)`: `lst1.extend(i for i in (i if i not...`. Here's an SO question on the difference between [genator expressions and list comprehensions](https://stackoverflow.com/q/47789/3216427), and the excellent [tutorial by Trey Hunner](https://pycon2018.trey.io/) that helped me really understand all things comprehension related in Python. -- (Note: I am not affiliated with Trey Hunner, but I am a very satisfied subscriber of his Python Morsels.) – joanis Sep 29 '21 at 12:50
  • @joanis Appreciated your comment too – Andy Su Sep 30 '21 at 01:58
  • Wow, this is embarrassing. I just profiled your answer with list comprehension and with generator expressions, and I found out that a) time different is tiny, so I really shouldn't care; and b) list list comprehension comes out ever so slightly faster! I don't fully understand what going on, but now I realize I have to revise my attitude about list comprehensions vs generator expressions. – joanis Sep 30 '21 at 13:18
  • Other cases I've just found where list comprehensions are faster than generator expressions, besides your code: https://stackoverflow.com/a/62709748/3216427 https://stackoverflow.com/a/9061024/3216427 I really have to adjust my attitude between these, and stop recommending comprehensions everywhere possible! – joanis Sep 30 '21 at 13:20
  • @joanis Well I think, for massive data, you can't get benefits both from time and space, less space with Generators and List comprehensions for fast iteration – Andy Su Oct 08 '21 at 03:34
1

You can use set comprehension to collect the "key" elements of each dictionary in your list. Then you can loop through all keys and check whether a key is in both lists.

keys_1 = {d["key"] for d in lst1}
keys_2 = {d["key"] for d in lst2}

output = []
for k in keys_1 | keys_2:
    d = {"key": k}
    if k in keys_1 and k in keys_2:
        d["confidence"] = 1
    output.append(d)
ogdenkev
  • 2,264
  • 1
  • 10
  • 19
1

You can avoid raw loops entirely using the intersection and symmetric_difference function on set:

# Shortened key names for brevity
a = [{"key": "a"}, {"key": "b"}, {"key": "c"}]
b = [{"key": "a"}, {"key": "d"}, {"key": "e"}]

# Turn both lists into sets
a_keys = {entry["key"] for entry in a}
b_keys = {entry["key"] for entry in b}

# Add elements that are in both sets with confidence set to 1
result = [{"key": key, "confidence": 1} for key in a_keys.intersection(b_keys)]
# Add elements that are not in both sets
result += [{"key": key} for key in a_keys.symmetric_difference(b_keys)]

Will result in:

[{'confidence': 1, 'key': 'a'},
 {'key': 'b'},
 {'key': 'd'},
 {'key': 'c'},
 {'key': 'e'}]

Note, that the element order will change, as they went through a set.

Possseidon
  • 502
  • 1
  • 4
  • 8
0

If you are not too worried about performance.

intersection = [value for value in lst1 if value in lst2]
res = [val for val in lst1 if val not in intersection] + [val for val in lst2 if val not in intersection]
res += list(map(add_confidence, intersection))
prnvbn
  • 700
  • 2
  • 7
  • 25
0

Another approach can be:

lst1 = [
    {'key': 'data_collected.service_data'},
    {'key': 'gdpr.gdpr_compliance'},
    {'key': 'disclosure_of_information.purpose_of_disclosure'},
    {'key': 'opt_out.choice_of_opt_out'}
]
lst2 = [
    {'key': 'child_data_protection.parent_guardian_consent'},
    {'key': 'ccpa.ccpa_compliance'},
    {'key': 'disclosure_of_information.purpose_of_disclosure'},
    {'key': 'opt_out.choice_of_opt_out'}
]
for data in lst1:
    # If same data exists in lst2, add confidence key and remove it from lst2
    if data in lst2:
        lst2.remove(data)
        data['confidence']=1

# At the end of above for loop, lst2 contains unique data, now just add both the lists to get the final result            
lst1 = lst1+lst2        
print (lst1)

Output:

[{'key': 'data_collected.service_data'}, {'key': 'gdpr.gdpr_compliance'}, {'key': 'disclosure_of_information.purpose_of_disclosure', 'confidence': 1}, {'key': 'opt_out.choice_of_opt_out', 'confidence': 1}, {'key': 'child_data_protection.parent_guardian_consent'}, {'key': 'ccpa.ccpa_compliance'}]
Bhagyesh Dudhediya
  • 1,800
  • 1
  • 13
  • 16