5

Two common ways to detect duplicates in an array:

1) sort first, time complexity O (n log n), space complexity O (1)

2) hash set, time complexity O (n), space complexity O (n)

Is there a 3rd way to detect a duplicate?

Please do not answer brute force.

Dante May Code
  • 11,177
  • 9
  • 49
  • 81
  • Related: [efficient way to remove duplicate integers from an array](http://stackoverflow.com/questions/1532819/algorithm-efficient-way-to-remove-duplicate-integers-from-an-array) – Dante May Code May 12 '11 at 03:56

3 Answers3

5

Yet another option is the Bloom filter. Complexity O(n), space varies (almost always less than a hash), but there is a possibility of false positives. There is a complex relationship between the size of the data set, the size of the filter, and the number of false positives that you can expect.

Bloom filters are often used as quick "sanity checks" before doing a more expensive duplicate check.

btilly
  • 43,296
  • 3
  • 59
  • 88
  • A Hashset is a more serious solution to this problem. – Karim Manaouil Jan 17 '18 at 00:57
  • @KarimManaouil I would reach for a hashset first as well, but that was one of the potential answers that we were asked NOT to provide. Bloom filters are a third approach that is used in practice, at scale. – btilly Jan 17 '18 at 01:14
  • I agree on the usage of bloom filters in the case the array is extremely large or the processing device is an embedded system where memory constraints are the major decision factor of the specific implementation (Especially, if one is obliged to achieve O(N) time complexity). Anyway, bloom filters have better usage cases than this scenario (eg. Disk I/O, networking). – Karim Manaouil Jan 17 '18 at 14:22
  • @KarimManaouil I agree. There is a tradeoff, and there are good reasons why bloom filters are usually not the right answer. But when you need them, it is good to know that the option exists. – btilly Jan 17 '18 at 17:34
4

Depends on the information.

If you know the range of the numbers, like 1-1000, you can use a bit array.

Lets say the range is a...b

Make a bit array with (b-a) bits. initialize them to 0.

Loop through the array, when you get to the number x , change the bit in place x-a to 1.

If there's a 1 there already, you have a duplicate.

time complexity: O(n)

space complexity: (b-a) bits

Yochai Timmer
  • 48,127
  • 24
  • 147
  • 185
1

Another method to find duplicates provided an n element array contains elements from 0 to n-1. <Find Duplicates>

Community
  • 1
  • 1
NirmalGeo
  • 773
  • 4
  • 12