Duke is a fast and flexible deduplication (or entity resolution, or record linkage) engine written in Java on top of Lucene
Questions tagged [duke]
8 questions
4
votes
0 answers
Looking for libraries which support deduplication on entity
I am going to work on some projects to deal with entity deduplication. Datasets (one or more) which may contain duplicate entity. In the realtime, entity may represent the name, address, country, email, social media id in the different form. My goal…

Roshan
- 2,019
- 8
- 36
- 56
2
votes
1 answer
Duke deduplication engine : exact same record not matched
I am attempting to use Duke to match records from one csv to another.First csv and second both has ID,Model,Price,CompanyName,Review,Url columns. I am trying to match to another csv to find duplicates records.
package no.priv.garshol.duke;
import…

Kishore
- 5,761
- 5
- 28
- 53
2
votes
1 answer
Duke deduplication engine: linking records not working?
I am attempting to use Duke to match records from one database to another. One db has song titles + writers. I am trying to match to another db to find duplicates and corresponding records.
I have gotten duke to run and I can see some of the records…

1000Suns
- 251
- 1
- 3
- 16
1
vote
0 answers
Duke Record Linkage Configuration XML
i've a problem with this record linkage: I have this two csv files and the perfect mapping, i've used this configuration but Duke give me always 0 link found. Perhaps i've selected wrong thresholds?
Someone can help…

Salvatore Taddeo
- 11
- 1
1
vote
0 answers
Duke deduplication engine: can't find exact records
I'm trying to create a configuration and processor for Duke to find exact matches in a record list. I created an ExactMatchComparator based processor but the function does not return exact matches.
Here's here's the setup of the processor,…

Raphael Khoury
- 111
- 7
1
vote
2 answers
Duke Fast Deduplication: java.lang.UnsupportedOperationException: Operation not yet supported?
I'm trying to use the Duke Fast Deduplication Engine to search for some duplicate records in the database at the company where I work.
I run it from the command line like this:
java -cp…

leeand00
- 25,510
- 39
- 140
- 297
0
votes
0 answers
Duke do action on duplicate record
I have created an application which is finding the duplicate records using the DUKE
The code:
public static void main(String[] args) throws IOException, SAXException
{
Configuration config =
ConfigLoader
…

Saurav Sinha
- 11
- 1
- 4
0
votes
3 answers
Duke - org.apache.lucene.analysis.standard.StandardAnalyzer
https://github.com/larsga/Duke - I am using Duke - for Data Deduplication.
I have setup Duke (jar files - Duke jar as well as lucene jars are added in the classpath) ..
Sample example in the github-…

Soundarya Thiagarajan
- 574
- 2
- 13
- 31