3

I'm using the so-far great Dedupe library to help link records from multiple providers. One of the fields I'm comparing is a phone number field. I'd like to use Google's phone number library to normalize these phone numbers. One other nice functionality is an ability to compare numbers and return a match type from 0 (not at all a match) to 4 (every component matches exactly).

So this seems like a natural fit for Dedupe's custom variable. But I'm a bit confused on what the custom comparator implementation should look like. The example in the docs is just a simple 0 vs 1 for match/non-match.

I basically want to ensure that, behind the scenes, my custom comparator will indicate to Dedupe that a 4 means the phone numbers are very close and a 0 means they're very far apart.

Will that work? Or do I have to return it some other way? E.g. do I have to indicate an exact match with 0?

  • From my reading of the documentation, you certainly can return whatever numeric value, but Dedupe will treat it as an *edit distance metric* regardless - therefore, if you want to get useful results, ensure that you return 0 for an exact match and larger numbers for worse matches. – Karl Knechtel Oct 11 '19 at 23:14
  • Hint: Google's library will give you the number 4 in the case where you want to return 0, and 0 in the case where you want to return the largest value that your function might return. Can you think of a *mathematical rule* that transforms Google's result into one you can use? – Karl Knechtel Oct 11 '19 at 23:15
  • I'm voting to close this question as off-topic because it appears to be really a logic question rather than a programming question. – Karl Knechtel Oct 11 '19 at 23:16

0 Answers0