74

I recently came across a question somewhere:

Suppose you have an array of 1001 integers. The integers are in random order, but you know each of the integers is between 1 and 1000 (inclusive). In addition, each number appears only once in the array, except for one number, which occurs twice. Assume that you can access each element of the array only once. Describe an algorithm to find the repeated number. If you used auxiliary storage in your algorithm, can you find an algorithm that does not require it?

What I am interested in to know is the second part, i.e., without using auxiliary storage. Do you have any idea?

codaddict
  • 445,704
  • 82
  • 492
  • 529
SysAdmin
  • 5,455
  • 8
  • 33
  • 34
  • 13
    pretty sure this has been asked before, but can't find the exact qn. the total of the n integers in sequence and the repeated integer x will be x + n(n-1)/2. – Pete Kirkham Apr 09 '10 at 07:41
  • Can you please change question title to something more descriptive? Maybe "Find duplicate array element with special constraints" – Michał Piaskowski Apr 09 '10 at 13:27
  • another mathematical property you can use here is the factorial. (n1 * n2 * .. ) / n! gives the required number. 1000! factorial is not that big of a number to be honest - http://justinwhite.com/big-calc/1000.html – Anurag Apr 09 '10 at 19:44
  • Yep, 1000! is not that big of a number - only 559 digits to fill the rest of the comment box and another 2921 digits over the limit... :-) – Franci Penov Apr 10 '10 at 03:38
  • A more hard (no single-pass in O(1) space solution) version of this question is "Algorithm to determine if array contains n…n+m?" http://stackoverflow.com/questions/177118/algorithm-to-determine-if-array-contains-n-nm – jfs Apr 10 '10 at 04:40
  • 2
    Slightly different question with the same answer: http://stackoverflow.com/questions/35185/finding-a-single-number-in-a-list – starblue Apr 10 '10 at 07:56
  • 1
    Again: http://stackoverflow.com/questions/1089987/given-an-array-of-numbers-except-for-one-number-all-the-others-occur-twice-giv – starblue Apr 10 '10 at 07:57
  • 3
    Duplicate: http://stackoverflow.com/questions/555744/algorithm-to-find-two-repeated-numbers-in-an-array-without-sorting – starblue Apr 10 '10 at 08:42
  • trick question - answer has been hanging around as long as the question has :-/ – FastAl Aug 18 '11 at 03:29
  • The problem is that after every number appears only once (we've got 1001 different integers) + 1 repeated number = 1002 integers – Manolete Apr 18 '12 at 14:18

19 Answers19

104

Just add them all up, and subtract the total you would expect if only 1001 numbers were used from that.

Eg:

Input: 1,2,3,2,4 => 12
Expected: 1,2,3,4 => 10

Input - Expected => 2
Aistina
  • 12,435
  • 13
  • 69
  • 89
leppie
  • 115,091
  • 17
  • 196
  • 297
  • Wouldn't that require auxiliary storage? – Brian Rasmussen Apr 09 '10 at 07:39
  • 2
    @Brian Rasmussen: where is the extra storage? – leppie Apr 09 '10 at 07:40
  • 3
    @leppie: To hold the calculated sum, but honestly I don't know exactly what the OP meant by extra storage. In any case, I like your answer. – Brian Rasmussen Apr 09 '10 at 07:42
  • 4
    @Brian, the interviewer probably meant "don't use a hash table or an array"... I'm pretty sure O(1) storage, especially a single variable, would be satisfactory. – Michael Aaron Safyan Apr 09 '10 at 07:44
  • 1
    I'm guessing auxilliary storage would be storage that takes more than O(1) space. – Justin Ardini Apr 09 '10 at 07:45
  • @Michael: Thanks, that makes sense. Please disregard by question then. – Brian Rasmussen Apr 09 '10 at 07:47
  • 7
    the methord works perfectly fine. but the example should have been something like ( 1,3,2,4,2=>12 ) - (1+2+3+4 => 10) = 2 – SysAdmin Apr 09 '10 at 07:56
  • 1
    This solution does not scale for bigger arrays. :-) – Franci Penov Apr 09 '10 at 08:03
  • 5
    @Franci Penov: I am not sure interview questions are supposed to scale :) – Brian Rasmussen Apr 09 '10 at 08:04
  • "I am not sure interview questions are supposed to scale"...but I am almost certain it would be the next question. – SysAdmin Apr 09 '10 at 09:00
  • @Aistina & @SysAdmin: Thanks, didn't notice that ;P – leppie Apr 09 '10 at 09:10
  • @leppie - scale in terms of array size that can be safely processed without overflow. :-) – Franci Penov Apr 09 '10 at 09:11
  • 1
    @Franci Penov: It's an algorithm, the size of integers and arrays are unbound. Most languages could deal with big ints, the array size might be a problem, but you would have a problem before you even start ;P – leppie Apr 09 '10 at 09:14
  • 1
    sum-based algorithm requires O(log(n)) additional storage to hold the sum that is worse than O(1) for xor-based solution. – jfs Jun 19 '10 at 19:52
  • 1
    @J.F.Sebastian why does addition need more storage than xor? In both cases you need an accumulator with O(log(n)) bits. – Paul Hankin Apr 07 '15 at 09:08
  • @Anonymous: you are right: the sum requires only twice as much bits as xor and constant factors do not matter for Big O. I don't remember what I meant: a very generous interpretation could be: one number (for the result) you have for free and using xor it is all you need but with the sum you need one more *additional* number (the sum is square) ~log n bits. – jfs Apr 07 '15 at 20:55
  • @J.F.Sebastian in both cases you need an accumulator with just enough bits to contain the result. With addition you can just throw away the overflow bits (and perform subtraction with two's complement addition). – Paul Hankin Apr 07 '15 at 23:56
  • @Anonymous: it is interesting. Though "sum with specific overflow behavior" is sufficiently different from just "sum" (there is no language tag on the question: overflow behaviour is unknown -- I would assume either infinite precision integers (such as in Python) or just "twice as wide" fixed integer for the sum). Could you post it as an answer with more details how the overflow works in this case? – jfs Apr 08 '15 at 09:38
77

Update 2: Some people think that using XOR to find the duplicate number is a hack or trick. To which my official response is: "I am not looking for a duplicate number, I am looking for a duplicate pattern in an array of bit sets. And XOR is definitely suited better than ADD to manipulate bit sets". :-)

Update: Just for fun before I go to bed, here's "one-line" alternative solution that requires zero additional storage (not even a loop counter), touches each array element only once, is non-destructive and does not scale at all :-)

printf("Answer : %d\n",
           array[0] ^
           array[1] ^
           array[2] ^
           // continue typing...
           array[999] ^
           array[1000] ^
           1 ^
           2 ^
           // continue typing...
           999^
           1000
      );

Note that the compiler will actually calculate the second half of that expression at compile time, so the "algorithm" will execute in exactly 1002 operations.

And if the array element values are know at compile time as well, the compiler will optimize the whole statement to a constant. :-)

Original solution: Which does not meet the strict requirements of the questions, even though it works to find the correct answer. It uses one additional integer to keep the loop counter, and it accesses each array element three times - twice to read it and write it at the current iteration and once to read it for the next iteration.

Well, you need at least one additional variable (or a CPU register) to store the index of the current element as you go through the array.

Aside from that one though, here's a destructive algorithm that can safely scale for any N up to MAX_INT.

for (int i = 1; i < 1001; i++)
{
   array[i] = array[i] ^ array[i-1] ^ i;
}

printf("Answer : %d\n", array[1000]);

I will leave the exercise of figuring out why this works to you, with a simple hint :-):

a ^ a = 0
0 ^ a = a
Radiodef
  • 37,180
  • 14
  • 90
  • 125
Franci Penov
  • 74,861
  • 18
  • 132
  • 169
  • 2
    A non-destructive method would be to maintain an accumulator on the side... would also make it more readable I think. – Matthieu M. Apr 09 '10 at 08:32
  • 2
    @Matthiey M. - but a non-destructive solution would require additional storage and thus violate the requirements of the problem. – Franci Penov Apr 09 '10 at 08:40
  • And your solution requires additional storage to hold the loop counter, plus possibly some intermediate values depending on how all that gets compiled. However, chosing a destructive algorithm does prevent overflow issues without introducing a separate type to hold the sum, which gives it an actual purpose. – Dennis Zickefoose Apr 09 '10 at 09:00
  • 1
    @Dennis Zickefoose - I am not arguing that a non-destructive solution with an additional integer variable is not better. :-) but it does violate the problem requirement, that's why I choose the destructive algorithm. as for the loop counter - there's no way to avoid this one, and it is implicitly allowed because the problem states that the code is allowed to iterate over the array once, which is not possible without loop counter. – Franci Penov Apr 09 '10 at 09:09
  • When you look at XOR of 1 to 1000, XOR of the whole array and compare this procedure with adding 1 to 1000 and comparing with the sum of the whole array you notice similarity - addition and XOR operation are related in binary. The only difference that XOR is more natural here because we know that the final difference can not be bigger then 1000. – Unreason Apr 09 '10 at 10:50
  • Not that this isn't a brilliant solution, but doesn't your first solution, the loop, access each member of the array twice? – Robert Gowland Apr 09 '10 at 13:21
  • @Robert - technically, it does. Hence the second solution. :-) – Franci Penov Apr 09 '10 at 13:45
  • It still have additional storage in the form of temporary values, no ? – fa. Apr 09 '10 at 15:08
  • 1
    @Pavel Shved - there's no trick to XOR, it's a mathematical operation with well known properties, just as addition, multiplication and others. – Franci Penov Apr 09 '10 at 15:17
  • @fa - I don't see any temporary values. For all I know, the compiler I've been given targets a CPU that has special instruction that does XOR on 1002 values at once. :-) – Franci Penov Apr 09 '10 at 15:22
  • 1
    @Pavel - also, you and I look at the problem in different ways - for I am not searching for duplicate number, I am searching for a duplicate pattern in sets of flags. when you state the problem in this way, using addition now becomes the "dirty trick" :-) – Franci Penov Apr 09 '10 at 17:23
  • @Franci: Wow, the stuff about flags makes sense. Perhaps, you might want to edit it into your answer. Then I can change the vote :-) – P Shved Apr 09 '10 at 19:07
23

A non destructive version of solution by Franci Penov.

This can be done by making use of the XOR operator.

Lets say we have an array of size 5: 4, 3, 1, 2, 2
Which are at the index:                        0, 1, 2, 3, 4

Now do an XOR of all the elements and all the indices. We get 2, which is the duplicate element. This happens because, 0 plays no role in the XORing. The remaining n-1 indices pair with same n-1 elements in the array and the only unpaired element in the array will be the duplicate.

int i;
int dupe = 0;
for(i = 0; i < N; i++) {
    dupe = dupe ^ arr[i] ^ i;
}
// dupe has the duplicate.

The best feature of this solution is that it does not suffer from overflow problems that is seen in the addition based solution.

Since this is an interview question, it would be best to start with the addition based solution, identify the overflow limitation and then give the XOR based solution :)

This makes use of an additional variable so does not meet the requirements in the question completely.

Radiodef
  • 37,180
  • 14
  • 90
  • 125
codaddict
  • 445,704
  • 82
  • 492
  • 529
  • 2
    Frankly, I'm not getting these XOR based solutions. Basically, we're trying to match the "index" with the value of the element. In case of match, the result will be zero and for repeated value, the xor result will be non zero. For a simple array --> {1,2,2} we will xor 1(element value)^1(index)^0 (previous xor result) --> 0; 2^2^0 --> 0; 3^2^0 --> 1. Here 1 is the final result value as per XOR solutions. I don't see how this is valid answer unless I'm missing something very obvious. – Prabhjot Apr 01 '14 at 18:43
  • @codaddict I think the loop should start with i initialized to 1. – Raman Singh Jul 20 '14 at 06:29
  • 1
    @codaddict +1 for the clear illustration and mentioning about the overflow (also for being non-destructive). The same will work, with a few changes, even if the integers have an offset, say `{ 1043, 1042, 1044, 1042 }` by XOR-ing with `{ 0, 1042, 1043, 1044 }`. – legends2k Oct 29 '14 at 12:01
15

Add all the numbers together. The final sum will be the 1+2+...+1000+duplicate number.

Laurynas Biveinis
  • 10,547
  • 4
  • 53
  • 66
8

To paraphrase Francis Penov's solution.

The (usual) problem is: given an array of integers of arbitrary length that contain only elements repeated an even times of times except for one value which is repeated an odd times of times, find out this value.

The solution is:

acc = 0
for i in array: acc = acc ^ i

Your current problem is an adaptation. The trick is that you are to find the element that is repeated twice so you need to adapt solution to compensate for this quirk.

acc = 0
for i in len(array): acc = acc ^ i ^ array[i]

Which is what Francis' solution does in the end, although it destroys the whole array (by the way, it could only destroy the first or last element...)

But since you need extra-storage for the index, I think you'll be forgiven if you also use an extra integer... The restriction is most probably because they want to prevent you from using an array.

It would have been phrased more accurately if they had required O(1) space (1000 can be seen as N since it's arbitrary here).

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • I've posted Python one-liner based on your answer http://stackoverflow.com/questions/2605766/how-to-find-a-duplicate-element-in-an-array-of-shuffled-consecutive-integers/2612318#2612318 – jfs Apr 10 '10 at 05:18
6

Add all numbers. The sum of integers 1..1000 is (1000*1001)/2. The difference from what you get is your number.

kgiannakakis
  • 103,016
  • 27
  • 158
  • 194
4

One line solution in Python

arr = [1,3,2,4,2]
print reduce(lambda acc, (i, x): acc ^ i ^ x, enumerate(arr), 0)
# -> 2

Explanation on why it works is in @Matthieu M.'s answer.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
3

If you know that we have the exact numbers 1-1000, you can add up the results and subtract 500500 (sum(1, 1000)) from the total. This will give the repeated number because sum(array) = sum(1, 1000) + repeated number.

Justin Ardini
  • 9,768
  • 2
  • 39
  • 46
3

Well, there is a very simple way to do this... each of the numbers between 1 and 1000 occurs exactly once except for the number that is repeated.... so, the sum from 1....1000 is 500500. So, the algorithm is:

sum = 0
for each element of the array:
   sum += that element of the array
number_that_occurred_twice = sum - 500500
Michael Aaron Safyan
  • 93,612
  • 16
  • 138
  • 200
1
public static void main(String[] args) {
    int start = 1;
    int end = 10;
    int arr[] = {1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10};
    System.out.println(findDuplicate(arr, start, end));
}

static int findDuplicate(int arr[], int start, int end) {

    int sumAll = 0;
    for(int i = start; i <= end; i++) {
        sumAll += i;
    }
    System.out.println(sumAll);
    int sumArrElem = 0;
    for(int e : arr) {
        sumArrElem += e;
    }
    System.out.println(sumArrElem);
    return sumArrElem - sumAll;
}
Radiodef
  • 37,180
  • 14
  • 90
  • 125
mRaza
  • 65
  • 1
  • 7
1
public int duplicateNumber(int[] A) {
    int count = 0;
    for(int k = 0; k < A.Length; k++)
        count += A[k];
    return count - (A.Length * (A.Length - 1) >> 1);
}
Radiodef
  • 37,180
  • 14
  • 90
  • 125
Sunil B N
  • 4,159
  • 1
  • 31
  • 52
1

No extra storage requirement (apart from loop variable).

int length = (sizeof array) / (sizeof array[0]);
for(int i = 1; i < length; i++) {
   array[0] += array[i];
}

printf(
    "Answer : %d\n",
    ( array[0] - (length * (length + 1)) / 2 )
);
Radiodef
  • 37,180
  • 14
  • 90
  • 125
N 1.1
  • 12,418
  • 6
  • 43
  • 61
1

Do arguments and callstacks count as auxiliary storage?

int sumRemaining(int* remaining, int count) {
    if (!count) {
        return 0;
    }
    return remaining[0] + sumRemaining(remaining + 1, count - 1);
}
printf("duplicate is %d", sumRemaining(array, 1001) - 500500);

Edit: tail call version

int sumRemaining(int* remaining, int count, int sumSoFar) {
    if (!count) {
        return sumSoFar;
    }
    return sumRemaining(remaining + 1, count - 1, sumSoFar + remaining[0]);
}
printf("duplicate is %d", sumRemaining(array, 1001, 0) - 500500);
Radiodef
  • 37,180
  • 14
  • 90
  • 125
cobbal
  • 69,903
  • 20
  • 143
  • 156
1
n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
Santhosh
  • 11
  • 1
0

My answer to question 2:

Find the sum and product of numbers from 1 -(to) N, say SUM, PROD.

Find the sum and product of Numbers from 1 - N- x -y, (assume x, y missing), say mySum, myProd,

Thus:

SUM = mySum + x + y;
PROD = myProd* x*y;

Thus:

x*y = PROD/myProd; x+y = SUM - mySum;

We can find x,y if solve this equation.

Radiodef
  • 37,180
  • 14
  • 90
  • 125
0

A triangle number T(n) is the sum of the n natural numbers from 1 to n. It can be represented as n(n+1)/2. Thus, knowing that among given 1001 natural numbers, one and only one number is duplicated, you can easily sum all given numbers and subtract T(1000). The result will contain this duplicate.

For a triangular number T(n), if n is any power of 10, there is also beautiful method finding this T(n), based on base-10 representation:

n = 1000
s = sum(GivenList)
r = str(n/2)
duplicate = int( r + r ) - s
psihodelia
  • 29,566
  • 35
  • 108
  • 157
0

In the aux version, you first set all the values to -1 and as you iterate check if you have already inserted the value to the aux array. If not (value must be -1 then), insert. If you have a duplicate, here is your solution!

In the one without aux, you retrieve an element from the list and check if the rest of the list contains that value. If it contains, here you've found it.

private static int findDuplicated(int[] array) {
    if (array == null || array.length < 2) {
        System.out.println("invalid");
        return -1;
    }
    int[] checker = new int[array.length];
    Arrays.fill(checker, -1);
    for (int i = 0; i < array.length; i++) {
        int value = array[i];
        int checked = checker[value];
        if (checked == -1) {
            checker[value] = value;
        } else {
            return value;
        }
    }
    return -1;
}

private static int findDuplicatedWithoutAux(int[] array) {
    if (array == null || array.length < 2) {
        System.out.println("invalid");
        return -1;
    }
    for (int i = 0; i < array.length; i++) {
        int value = array[i];
        for (int j = i + 1; j < array.length; j++) {
            int toCompare = array[j];
            if (value == toCompare) {
                return array[i];
            }
        }
    }
    return -1;
}
user3743369
  • 61
  • 1
  • 4
0

Improvement of Fraci's answer based on the property of XORing consecutive values:

int result = xor_sum(N);
for (i = 0; i < N+1; i++)
{
   result = result ^ array[i];
}

Where:

// Compute (((1 xor 2) xor 3) .. xor value)
int xor_sum(int value)
{
    int modulo = x % 4;
    if (modulo == 0)
        return value;
    else if (modulo == 1)
        return 1;
    else if (modulo == 2)
        return i + 1;
    else
        return 0;
}

Or in pseudocode/math lang f(n) defined as (optimized):

if n mod 4 = 0 then X = n
if n mod 4 = 1 then X = 1
if n mod 4 = 2 then X = n+1
if n mod 4 = 3 then X = 0

And in canonical form f(n) is:

f(0) = 0
f(n) = f(n-1) xor n
Radiodef
  • 37,180
  • 14
  • 90
  • 125
Seva Parfenov
  • 991
  • 1
  • 8
  • 6
0

I support the addition of all the elements and then subtracting from it the sum of all the indices but this won't work if the number of elements is very large. I.e. It will cause an integer overflow! So I have devised this algorithm which may be will reduce the chances of an integer overflow to a large extent.

   for i=0 to n-1
        begin:  
              diff = a[i]-i;
              dup = dup + diff;
        end
   // where dup is the duplicate element..

But by this method I won't be able to find out the index at which the duplicate element is present!

For that I need to traverse the array another time which is not desirable.

Peter O.
  • 32,158
  • 14
  • 82
  • 96
Poulami
  • 1,047
  • 3
  • 13
  • 27
  • Simple sums will actually work work. Integer overflow is not a problem, provided the variable tallying the sum is unsigned. – Greg A. Woods Nov 14 '12 at 22:27