Removing duplicate entries in an array (C++)

Question

I'm having an issue in which a function that in theory should remove all duplicate values from an array doesn't work. Here's how it works:

I have two arrays, and then I populate them with random numbers between 0 and 50 inclusive.
I sort the array values in order using a sort function
I then run my dedupe function
I sort the array values in order again
I then output the values in both arrays

The problem is, the loop in the dedupe function is ran 19 times regardless of how many duplicate entries it finds, which is extremely strange. Also, it still gives duplicates.

Any ideas? Thanks!

int* dedupe(int array[ARRAY_SIZE])      //remove duplicate array values and replace with new values.
{   bool dupe = false;
    while(dupe!=true)
    {   
        for(int j=0; j<ARRAY_SIZE; j++)
        {   if(array[j] == array[j+1])
            {   array[j] = rand(); 
                array[j] = array[j] % 51;
                dupe = false;
            }
            else { dupe = true; // the cout part is for debugging
                    cout << dupe << endl; }
        }
    } return array;
}
int main()
{
    int a[9], b[9];
    srand(time(0));
    populate(b);
    populate(a);
    sort(a,ARRAY_SIZE);
    sort(b,ARRAY_SIZE);
    dedupe(a);
    dedupe(b);
    sort(a,ARRAY_SIZE);
    sort(b,ARRAY_SIZE);
    for(int i=0; i<10; i++)
    {   cout << "a[" << i << "] = " << a[i] << "\t\t" << "b[" << i << "] = " << b[i] << endl; }
    return 0;
}

Nothing suggested so far has solved the problem. Does anyone know of a solution?

You might want to take a look at `std::unique`: http://www.cplusplus.com/reference/algorithm/unique/ — chris, Apr 20 '12 at 22:43
How is replacing a duplicate with a random number supposed to remove duplicates? [What do you need](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) the numbers for? — outis, Apr 20 '12 at 22:50
This is a very good point. I'm not sure what to replace it with other than a random number since the array itself at the end and after de-duplication is still supposed to be populated with random numbers... Also, the while loop runs until there are no more duplicate values found so it does remove duplicate numbers -- in theory at least. — Pablo Canseco, Apr 20 '12 at 22:53
Possible dup of [How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N](http://stackoverflow.com/q/158716/90527). See also [Unique random numbers in O(1)?](http://stackoverflow.com/q/196017/90527), [Create Random Number Sequence with No Repeats](http://stackoverflow.com/q/693880/90527). — outis, Apr 20 '12 at 22:54
You're invoking undefined behavior all over the place, no wonder it isn't working. [You should read this about using and accessing arrays](http://www.cplusplus.com/doc/tutorial/arrays/). — Cornstalks, Apr 20 '12 at 23:52

score 0 · Answer 1 · answered Apr 20 '12 at 22:45

0

You're not returning from inside the for loop... so it should run exactly ARRAY_SIZE times each time.

answered Apr 20 '12 at 22:45

djechlin

59,258
35
162
290

So if I return from inside the for loop it will run over and over until it doesn't find any more duplicate adjacent values? – Pablo Canseco Apr 20 '12 at 22:49
Returning from inside the for loop does nothing, still get duplicate values. – Pablo Canseco Apr 20 '12 at 22:51
Oh, I see. Even if dupe is set to false once it will be set to true later. Whether dupe is true or false gets rewritten up until the last loop, which, by the way, is accessing the array out of bounds, so is probably a random junk integer and therefore will show up as unequal. – djechlin Apr 20 '12 at 22:54
Right, but that would be at the last value, which has no adjacent greater value to test for duplicateness so at that point the function is free to return since all other values have been tested. – Pablo Canseco Apr 20 '12 at 23:00

score 0 · Answer 2 · answered Apr 21 '12 at 00:54

The problem that you want to solve and the algorithm that you provided do not really match. You do not really want to remove the duplicates, but rather guarantee that all the elements in the array are different, the difference being that by removing duplicates the number of elements in the array would be less than the size of the array, but you want a full array.

I don't know what the perfect solution would be (algorithmically), but one simple answer would be creating an array of all the values in the valid range (since the range is small), shuffling it and then picking up the first N elements. Think of this as using cards to pick the values.

 const int array_size = 9;
 void create_array( int (&array)[array_size] ) {
    const int max_value = 51;
    int range[max_value];
    for ( int i = 0; i < max_value; ++i ) {
       range[i] = i;
    }
    std::random_shuffle( range, range+max_value );
    std::copy_n( range, array_size, array );
 }

This is not the most efficient approach, but it is simple, and with a small number of elements there should not be any performance issues. A more complex approach would be to initialize the array with the random elements in the range, sort and remove duplicates (actually remove, which means that the array will not be full at the end) and then continue generating numbers and checking whether they are new against the previously generated numbers.

Simplest approach is just comparing with every other value which is linear time but on an array of 9 elements linear time is small enough not to matter.

score -1 · Answer 3 · answered Apr 20 '12 at 22:51

-1

you are doing it wrong at array[j] = rand(); array[j] = array[j] % 51

It will always have 1 to ARRAY SIZE!!

answered Apr 20 '12 at 22:51

Sekhar

5,614
9
38
44

ARRAY_SIZE is declared to be 10 as a global constant. – Pablo Canseco Apr 20 '12 at 23:01
1

@Pirate43: In which case you're invoking undefined behavior, because the size of `a` and `b` are 9, not 10. You're also invoking undefined behavior when trying to print `a[i]` and `b[i]`. – Cornstalks Apr 20 '12 at 23:45

Removing duplicate entries in an array (C++)

3 Answers3