complexity of a duplicate detection algorithm

Question

I'm not so familiar with complexity notation, can someone help me identifying the complexity of this algorithm?

for (int i = 0; i < records.size(); i++){

    for (int j = i; j < records.size(); j++){

        if(j != i & isDuplicate(records.get(i), records.get(j))){

            Pair p = new Pair(records.get(i).RecID,records.get(j).RecID);

            duplicates.add(p);

        }

    }

}

I have a database table and want to check each record, with all the records only once, to check if they are duplicates.

score 0 · Accepted Answer · answered Feb 03 '18 at 16:17

0

It can be done in O(n) by hashing the first list and looking through the second to check if data in hashset. Both loops are o(n). Hash lookup is O(1) per item.

Your code is n**2, but that's not the most efficient way to do it

answered Feb 03 '18 at 16:17

Ctznkane525

7,297
3
16
40

thanks a lot for your answer, thing is for every field of a record i do a levershtein distance comparison and decide according to how many fields are similar if its a duplicate or not.. i don´t think this can be done by hashing – Rob Feb 04 '18 at 18:13

complexity of a duplicate detection algorithm

1 Answers1