Say I have different tables with names, all differently written:
John Doe
John W. Doe
john doe
john w doe
What are you using to associate all entries to the same entity (deduplication) on the JVM?
The Dedupe library in Python seems a tool that would do the job. Is there something similar for the JVM, maybe using Spark?
This question lists a few alternatives but maybe it isn't up to date.