Efficient algorithm for finding the same number

Question

There are 1002 numbers in an array and two numbers are the same. How would you find the same number in this array efficiently or is there an efficient algorithm?

Here is my algorithm:

for i in range(0, 1002):
    for j in range(i+1, 1002):
        if(a[i]==a[j]):
           return a[i]

What is range of values of numbers? use hashing! It will work in O(n) time! — Nullpointer, Jun 15 '14 at 18:42
Still you will have some range! what is that range? What is your requirement a memory efficient algorithm? or time efficient algorithm? — Nullpointer, Jun 15 '14 at 18:48
It was a technical interview question and I don't remember that the range was specified and also memory or time efficieny is not specified. I just wondered how this can be solved efficiently? — Figen Güngör, Jun 15 '14 at 18:50
If you need time efficient algorithm and you have plenty of memory then you can use hashing otherwise you need to compromise on time efficiency! I will post the code of hashing! — Nullpointer, Jun 15 '14 at 18:53
Without "efficiency" defined, it's an ambiguous question with no real answer. If I asked this question as an interviewer, I'd be fishing for a candidate to ask what I meant before I cared about an answer. — Preston Guillot, Jun 15 '14 at 18:57
If there are 1002 numbers, shouldn't the range be from 1 to 1002 or from 0 to 1001 ? — Christophe, Jun 15 '14 at 19:10
I mean values of the numbers are not in this range and not ordered. — Figen Güngör, Jun 15 '14 at 19:21

score 2 · Accepted Answer · answered Jun 15 '14 at 18:57

2

This should work!

#include<stdio.h>
#define RANGE 1000000001
int main()
{
  int arr[1002];//your all numbers;
  short int hash[RANGE];//Your range of numbers 
  long long int i;
  for(i = 0; i < RANGE; i++)
    hash[i] = 0;
  for(i = 0; i < 1002; i++)
    {
      if(hash[arr[i]] != 0)
    {
      printf("Duplicate number is:%d\n",arr[i]);
      break;
    }
      else
    hash[arr[i]]++;
    }
  return 0;
}

answered Jun 15 '14 at 18:57

Nullpointer

1,086
7
20

1

I wouldn't call the identity function a hash function, so the name is a bit misleading :) And you should really use `RANGE = 1003` – Niklas B. Jun 15 '14 at 19:30
Note that this solution requires at least 1000000001 operations for zeroing out `hash`, which is about 1000 times more than the original `1000*10001`, and cannot cope with numbers greater than 1000000001. – n. m. could be an AI Jun 15 '14 at 19:37
@NiklasB. If I use RANGE = 1003 then I will have to limit range of my values to 1003 but OP says no such value range is defined this is why I have used RANGE = 1000000001. And yes hash function is a bit misleading! :p – Nullpointer Jun 16 '14 at 02:17
Well in that case you *must* use a hash table ;) – Niklas B. Jun 16 '14 at 06:13

score 1 · Answer 2 · answered Jun 15 '14 at 18:53

1

I think the most efficient solution is to use hash set:

from sets import Set
s=Set()
for x in [1,2,3,4,5,2,3,1]:
  if x in s:
    print x
    break
  s.add(x)

answered Jun 15 '14 at 18:53

Aleksei Shestakov

2,508
2
13
14

score 0 · Answer 3 · answered Jun 15 '14 at 18:46

0

If your values are numbers, you can use radix sort to fill up a buffer and check for an element that appeared twice.

answered Jun 15 '14 at 18:46

Rliger

79
4

score 0 · Answer 4 · answered Jun 15 '14 at 19:47

Your algortihm isn't bad at all ! In the worst case you loop n*(n-1)/2, meaning a complexity of O(n²).

The most favourable condition would be a sorted array. THen you could just loop through it comparing each element with its predecessor. The worst is n-1 comparisons, otherwhise said a complexity of O(n).

However, I assume that the array is not sorted. Sorting it would imply the cost of the sort. Quiksort algorithm, which is pretty good here, has a worstcase of O(n²). So sorting+traversing would have a cost comparable to your algorithm.

Using a hash... well, it's optimal if memory is not a problem (see exellent solution from @Nullpointer. The algorithm cost is the simple traversal, which is O(n).

However in real life, you risk to have memory constraints, meaning shorter hash table and a hash function with risks of colisions (for example modulo size of table). For this reason you'll need to store for each hash value, the list of matching values. In such a situation, the worstcase is when all numbers have the same hash H. In this case, you would calculate each hash (simple O(n) traversal), but when inserting the hash, you'd need to loop through the colision list. A quick calculation shows that again you'd have n*(n-1)/2 comparison, and again a compelxity O(n²), the same as your original proposal.

Efficient algorithm for finding the same number

4 Answers4