1

I am using a deepdiff to compare data of two databases. here is example

from deepdiff import DeepDiff
users1 = [{'id': 1, 'name': 'John', 'age': 30}, {'id': 2, 'name': 'Jane', 'age': 25}]
users2 = [{'id': 1, 'name': 'John', 'age': 30}, {'id': 2, 'name': 'Bob', 'age': 35}]
diff = DeepDiff(users1, users2)
print(diff)

It is giving me output like below

{'values_changed': {"root[0]['age']": {'new_value': 20, 'old_value': 30}, "root[1]['name']": {'new_value': 'Bob', 'old_value': 'Jane'}, "root[1]['age']": {'new_value': 35, 'old_value': 25}}}

But I wanted the keys like ids should also be printed so that i will be able to know which id has mismatched.

The sample example output could be

 {'values_changed': {"root[ID_VALUE]['age']": {'new_value': 20, 'old_value': 30}, "root[1]['name']": {'new_value': 'Bob', 'old_value': 'Jane'}, "root[ID_VALUE]['age']": {'new_value': 35, 'old_value': 25}}}

or

{'values_changed': {1:{"root[0]['age']": {'new_value': 20, 'old_value': 30}}, "{2:root[1]['name']": {'new_value': 'Bob', 'old_value': 'Jane'}, "root[1]['age']": {'new_value': 35, 'old_value': 25}}}}

Is there any way to do this.

Thanks

Mark
  • 7,785
  • 2
  • 14
  • 34
Sanjay Kumar
  • 145
  • 1
  • 1
  • 10

2 Answers2

1
# making a slightly larger example 
users1 = [{'id': 69, 'name': 'John', 'age': 30}, {'id': 420, 'name': 'Jane', 'age': 25}, {'id': 123, 'name': 'Janet', 'age': 32}, {'id': 42, 'name': 'Jack', 'age': 22}]
users2 = [{'id': 69, 'name': 'John', 'age': 30}, {'id': 420, 'name': 'Bob', 'age': 35}, {'id': 123, 'name': 'Janet', 'age': 69}, {'id': 42, 'name': 'Jack', 'age': 22}]

dd = DeepDiff(users1, users2)
k = [users1[i]['id'] for i in dd.affected_root_keys] # [420, 123]
v = dd["values_changed"].values()

dict(zip(k, v))

This assumes that the IDs don't change between the two lists

Mark
  • 7,785
  • 2
  • 14
  • 34
  • I wanted whole ids to be printed along with the mismatched value lets check the below example ``` {420: {'new_value': 35, 'old_value': 25}, 123: {'new_value': 69, 'old_value': 32}} I am converting the Dataframe into dictionary to compare using deepdiff here is an example that I wrote But wanted to know if there is any efficient way to do this Please check next comment for code – Sanjay Kumar Jul 17 '23 at 17:29
  • `users1 = [{'id': 69, 'name': 'John', 'age': 30}, {'id': 420, 'name': 'Jane', 'age': 25}, {'id': 123, 'name': 'Janet', 'age': 32}, {'id': 42, 'name': 'Jack', 'age': 22}] df = pd.DataFrame(users1) # I making df here for example but in real i am doing from df to dict users2 = [{'id': 69, 'name': 'John', 'age': 30}, {'id': 420, 'name': 'Bob', 'age': 35}, {'id': 123, 'name': 'Janet', 'age': 69}, {'id': 42, 'name': 'Jack', 'age': 22}] diff = DeepDiff(users1, users2) # print(diff) diff = diff.get('values_changed')` – Sanjay Kumar Jul 17 '23 at 17:33
  • `if diff is not None:` `for key in list(diff.keys()):` `find_index = re.search(r'\b\d+\b', key)` ` index = int(find_index.group())` `primary_key_from_df = df['id'].iloc[index]` `diff[primary_key_from_df] = diff.pop(key)` `print(diff)` Example Answer `{420: {'new_value': 35, 'old_value': 25}, 123: {'new_value': 69, 'old_value': 32}}` – Sanjay Kumar Jul 17 '23 at 17:39
  • Check the below answer and suggest me if any efficient wayt to do this – Sanjay Kumar Jul 17 '23 at 17:43
  • @SanjayKumar updated – Mark Jul 17 '23 at 17:49
  • @SanjayKumar I'm about to go to sleep (it's quite late here)- the code you wrote looks good! Honestly unless you are looking at running this 1 million + times optimising it more than this is likely to be overkill – Mark Jul 17 '23 at 17:52
  • I have billions of rows sometimes to be matched but since I am using mutithreading so it will not cross millions. I will really appreciate if you can suggest the optimal solution whenever you are free. – Sanjay Kumar Jul 17 '23 at 18:23
  • @SanjayKumar I think you're best bet if performance is a problem is to benchmark different options and see which runs fastest – Mark Jul 18 '23 at 00:26
  • re: other ideas- as a general principle, the parts of Python written in C will run the fastest, so (I presume) the whole thing would run faster using Numpy – Mark Jul 18 '23 at 00:27
  • Thanks for updating. I would appreciate if any example link of numpy is available. I tried pandas but that has also performance issue and more complex(Due to data typecasting manually). – Sanjay Kumar Jul 18 '23 at 04:37
  • https://numpy.org/doc/stable/user/quickstart.html – Mark Jul 18 '23 at 04:38
1

I tried and was able to get the desired result but wanted to know If there is any efficient way to this

users1 = [{'id': 69, 'name': 'John', 'age': 30}, {'id': 420, 'name': 'Jane', 'age': 25}, {'id': 123, 'name': 'Janet', 'age': 32}, {'id': 42, 'name': 'Jack', 'age': 22}]
df = pd.DataFrame(users1) # I making df here for example but in real i am doing from df to dict
users2 = [{'id': 69, 'name': 'John', 'age': 30}, {'id': 420, 'name': 'Bob', 'age': 35}, {'id': 123, 'name': 'Janet', 'age': 69}, {'id': 42, 'name': 'Jack', 'age': 22}]
diff = DeepDiff(users1, users2)
# print(diff)
diff = diff.get('values_changed')
if diff is not None:
                for key in list(diff.keys()):
                    find_index = re.search(r'\b\d+\b', key)
                    index = int(find_index.group())
                    primary_key_from_df = df['id'].iloc[index]
                    diff[primary_key_from_df] = diff.pop(key)
print(diff)

This is giving me below result which is fine for me

{420: {'new_value': 35, 'old_value': 25}, 123: {'new_value': 69, 'old_value': 32}}
Sanjay Kumar
  • 145
  • 1
  • 1
  • 10