I recently had to compare two arrays of objects and identify the similar ones, same as taking the intersection of two sets.
// List A
const listA = [
{ name: 'John', age: 22, gender: 'male', city: 'Boston', sport: 'Basketball' },
{ name: 'Jane', age: 25, gender: 'female', city: 'Chicago', sport: 'Football' },
...
]
// List B
const listB = [
{ name: 'Tom', age: 20, gender: 'male', city: 'New York', sport: 'Soccer' },
{ name: 'Jane', age: 25, gender: 'female', city: 'Chicago', sport: 'Football' },
...
]
The items in both ListA
and ListB
don't have any unique id
attribute. And the two objects can only be considered similar if ALL of their key-value
pairs match. In the above example, Jane
is a match.
Now, I understand that Object Deep Matching is a well-known problem in JS. For this scenario, I took a different route and calculated hash of every object in both the lists. To calculate the hash, I took the value
of every key and passed it to a standard SHA256 hashing function. For example, the hash of first object in listA
would be as following.
const objectToHash = { name: 'John', age: 22, gender: 'male', city: 'Boston', sport: 'Basketball' };
// Extract the values
const plainText = 'John22maleBostonBasketball'
// Hash it
const hashed = sha256(plainText) // output: dd3defj3434j23rfjf2402439432
This way, I have two lists of hashes (or strings) and comparing them becomes easier. Unless two objects are fully identical, their hash won't be the same. I understand that calculating hashes is computationally intensive but that's not a problem for my use case.
Am I missing something?