1

Say I have different tables with names, all differently written:

John Doe
John W. Doe
john doe
john w doe

What are you using to associate all entries to the same entity (deduplication) on the JVM?

The Dedupe library in Python seems a tool that would do the job. Is there something similar for the JVM, maybe using Spark?

This question lists a few alternatives but maybe it isn't up to date.

ticofab
  • 7,551
  • 13
  • 49
  • 90

0 Answers0