3

I recently had to compare two arrays of objects and identify the similar ones, same as taking the intersection of two sets.

// List A

const listA = [
  { name: 'John', age: 22, gender: 'male', city: 'Boston', sport: 'Basketball' },
  { name: 'Jane', age: 25, gender: 'female', city: 'Chicago', sport: 'Football' },
  ...
]
// List B

const listB = [
  { name: 'Tom', age: 20, gender: 'male', city: 'New York', sport: 'Soccer' },
  { name: 'Jane', age: 25, gender: 'female', city: 'Chicago', sport: 'Football' },
  ...
]

The items in both ListA and ListB don't have any unique id attribute. And the two objects can only be considered similar if ALL of their key-value pairs match. In the above example, Jane is a match.

Now, I understand that Object Deep Matching is a well-known problem in JS. For this scenario, I took a different route and calculated hash of every object in both the lists. To calculate the hash, I took the value of every key and passed it to a standard SHA256 hashing function. For example, the hash of first object in listA would be as following.

const objectToHash = { name: 'John', age: 22, gender: 'male', city: 'Boston', sport: 'Basketball' };

// Extract the values
const plainText = 'John22maleBostonBasketball'

// Hash it
const hashed = sha256(plainText) // output: dd3defj3434j23rfjf2402439432

This way, I have two lists of hashes (or strings) and comparing them becomes easier. Unless two objects are fully identical, their hash won't be the same. I understand that calculating hashes is computationally intensive but that's not a problem for my use case.

Am I missing something?

Hyperbola
  • 466
  • 1
  • 6
  • 20
  • What if the keys are out of order for comparison and give false negatives (or positives...), or there are gaps that overlap and give false positives? – Jared Farrish Aug 07 '19 at 02:25
  • The keys are always in the same order in my use case. – Hyperbola Aug 07 '19 at 02:28
  • I don't know you have a question. I would normalize by same-casing and trimming whitespace and the like, but you've probably thought of that as well. – Jared Farrish Aug 07 '19 at 02:29
  • That has been taken care of. I wanted to ask if something is wrong with this approach of comparing objects? – Hyperbola Aug 07 '19 at 02:36
  • There's no big deal; you're just doing string comparisons. The hashing itself is probably unnecessary. At some point ya gotta let it ride, if you've controlled for various scenarios. – Jared Farrish Aug 07 '19 at 02:40
  • I would only worry about cases where combining two adjacent values yields the same output. eg would be { city: "foo", sport: "bar" } => foobar and { city: "foob", sport: "ar" } => foobar. – Neel Mehta Aug 07 '19 at 02:45
  • 1
    @NeelMehta It's the _Foob Argh!!!_. Pirate case. If that's a consideration, leave the initial case or concat with a special character or space between values before hashing. The data as presented doesn't suggest that, but anything's possible. – Jared Farrish Aug 07 '19 at 02:55
  • It's just that there's many cases here that might cause bugs. For instance what if you have "FooB" "ar" vs "Foo" "Bar" or if the word ends or begins with the special character. Converting everything to pirate case might just work though ;) – Neel Mehta Aug 08 '19 at 17:36

0 Answers0