4

Is there an equivalent package available in R similar to the dedupe library in Python?

The reason being is that I have used the package 'Record Linkage' in the past but when it comes to larger data-sets it seems to have a hard time. Dedupe seems to run very fast in Python and introduces an element of machine learning.

Anybody have recommendations that have proven successful?

fgregg
  • 3,173
  • 30
  • 37
Rtab
  • 123
  • 10
  • you can check fastlink package that implements the Expectation-Maximization (EM) algorithm for record linkage tasks, which is known for its efficiency and scalability. – Dr Nisha Arora Jul 02 '23 at 15:55

1 Answers1

0

I have been using this package: https://journal.r-project.org/articles/RJ-2022-038/RJ-2022-038.pdf

It seems to perform well for a data set of few thousand records. (<5k)

It claims being more performant than RecordLinkage.However I have not tried it on larger data. I have not compared the Python: Dedupe implementation against this yet.