2

Based on a this logic given as an answer on SO to a different(similar) question, to remove repeated numbers in a array in O(N) time complexity, I implemented that logic in C, as shown below. But the result of my code does not return unique numbers. I tried debugging but could not get the logic behind it to fix this.

int remove_repeat(int *a, int n)
{
    int i, k;

    k = 0;
    for (i = 1; i < n; i++)
    {
        if (a[k] != a[i]) 
        {
            a[k+1] = a[i];
            k++;            
        }
    }
    return (k+1);
}

main()
{
    int a[] = {1, 4, 1, 2, 3, 3, 3, 1, 5};
    int n;
    int i;

    n = remove_repeat(a, 9);

    for (i = 0; i < n; i++)
            printf("a[%d] = %d\n", i, a[i]);


} 

1] What is incorrect in above code to remove duplicates.

2] Any other O(N) or O(NlogN) solution for this problem. Its logic?

Community
  • 1
  • 1
goldenmean
  • 18,376
  • 54
  • 154
  • 211
  • What did you learn when you tried debugging? – Cody Gray - on strike Jul 17 '11 at 15:09
  • Your intention is unclear. It's clear to me that it does not work, please describe in your own words what are you trying to code. – Drakosha Jul 17 '11 at 15:15
  • @Cody - That this logic is trying to build a sub array from 0 to j of unique numbers. – goldenmean Jul 17 '11 at 15:17
  • @Drakosha - What is not clear above? Trying to implement a logic to remove repeated elements(integers in this case) from an array, and return only unique ones in same array, while trying to get it done in O(N) time complexity. – goldenmean Jul 17 '11 at 15:19
  • @goldenmean - i understand that, the question is what is your idea, i.e. HOW are you planning to do that. – Drakosha Jul 17 '11 at 15:29
  • @Drakosha - Your previous comment said my intention is unclear. ?? – goldenmean Jul 17 '11 at 15:49
  • @goldenmean - i meant it's difficult to understand what algorithm you are coding, explaining it in english would help. – Drakosha Jul 17 '11 at 19:46
  • @goldenmean let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/1556/discussion-between-drakosha-and-goldenmean) – Drakosha Jul 17 '11 at 19:46

7 Answers7

2
  1. Heap sort in O(n log n) time.
  2. Iterate through in O(n) time replacing repeating elements with a sentinel value (such as INT_MAX).
  3. Heap sort again in O(n log n) to distil out the repeating elements.

Still bounded by O(n log n).

Matt Joiner
  • 112,946
  • 110
  • 377
  • 526
1

You are going to need two loops, one to go through the source and one to check each item in the destination array.

You are not going to get O(N).

[EDIT] The article you linked to suggests a sorted output array which means the search for duplicates in the output array can be a binary search...which is O(LogN).

Steve Wellens
  • 20,506
  • 2
  • 28
  • 69
  • I thought the article said "Proceed from element a[1] to a[N]. At each stage i, all of the elements to the left of a[i] comprise a sorted heap of elements a[0] through a[j]". Which means for each iteration of i, the elements in a to left of i in a[] are sorted elements obtained by this logic? Am i missing something? – goldenmean Jul 17 '11 at 15:23
1

Your code would appear to require that the input is sorted. With unsorted inputs as you are testing with, your code will not remove all duplicates (only adjacent ones).

Iridium
  • 23,323
  • 6
  • 52
  • 74
1

You are able to get O(N) solution if the number of integers is known up front and smaller than the amount of memory you have :). Make one pass to determine the unique integers you have using auxillary storage, then another to output the unique values.

Code below is in Java, but hopefully you get the idea.

int[] removeRepeats(int[] a) {
    // Assume these are the integers between 0 and 1000
    Boolean[] v = new Boolean[1000]; // A lazy way of getting a tri-state var (false, true, null)

    for (int i=0;i<a.length;++i) {
       v[a[i]] = Boolean.TRUE;
    } 

    // v[i] = null => number not seen
    // v[i] = true => number seen

    int[] out = new int[a.length];
    int ptr = 0;
    for (int i=0;i<a.length;++i) {
        if (v[a[i]] != null && v[a[i]].equals(Boolean.TRUE)) {
            out[ptr++] = a[i];
            v[a[i]] = Boolean.FALSE;          
        }
    }

    // Out now doesn't contain duplicates, order is preserved and ptr represents how
    // many elements are set.
    return out;
}
Jeff Foster
  • 43,770
  • 11
  • 86
  • 103
  • @Jeff - This won't work if I have -ve numbers as well, isn't it. The first step of finding if a number is seen or not might fail? Any thoughts? – goldenmean Jul 17 '11 at 15:40
  • @Drakosha All the loops are over the input array so I think O(N)? Or am I misunderstanding? @GoldenMean, if you've got a constrained universe (e.g. numbers between -50 and 50, then you can simply shift values. It's basically the same approach as using a hash table to store the counts of the numbers, only using perfect hashing instead of a traditional array + bucket or tree implementation. – Jeff Foster Jul 17 '11 at 16:12
  • @Jeff Foster: i meant initializing `v` is `O(number of integers)`, There's a trick to init an array in O(1), but it requires more memory. – Drakosha Jul 17 '11 at 16:42
  • @Drakosha ok, I agree with that! Thanks – Jeff Foster Jul 17 '11 at 16:59
1

Your code only checks whether an item in the array is the same as its immediate predecessor.

If your array starts out sorted, that will work, because all instances of a particular number will be contiguous.

If your array isn't sorted to start with, that won't work because instances of a particular number may not be contiguous, so you have to look through all the preceding numbers to determine whether one has been seen yet.

To do the job in O(N log N) time, you can sort the array, then use the logic you already have to remove duplicates from the sorted array. Obviously enough, this is only useful if you're all right with rearranging the numbers.

If you want to retain the original order, you can use something like a hash table or bit set to track whether a number has been seen yet or not, and only copy each number to the output when/if it has not yet been seen. To do this, we change your current:

if (a[k] != a[i])
    a[k+1] = a[i];

to something like:

if (!hash_find(hash_table, a[i])) { 
    hash_insert(hash_table, a[i]);
    a[k+1] = a[i];
}

If your numbers all fall within fairly narrow bounds or you expect the values to be dense (i.e., most values are present) you might want to use a bit-set instead of a hash table. This would be just an array of bits, set to zero or one to indicate whether a particular number has been seen yet.

On the other hand, if you're more concerned with the upper bound on complexity than the average case, you could use a balanced tree-based collection instead of a hash table. This will typically use more memory and run more slowly, but its expected complexity and worst case complexity are essentially identical (O(N log N)). A typical hash table degenerates from constant complexity to linear complexity in the worst case, which will change your overall complexity from O(N) to O(N2).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

Your logic just wrong, so the code is wrong too. Do your logic by yourself before coding it. I suggest a O(NlnN) way with a modification of heapsort. With heapsort, we join from a[i] to a[n], find the minimum and replace it with a[i], right? So now is the modification, if the minimum is the same with a[i-1] then swap minimum and a[n], reduce your array item's number by 1. It should do the trick in O(NlnN) way.

Khoa Le
  • 372
  • 2
  • 9
0

Your code will work only on particular cases. Clearly, you're checking adjacent values but duplicate values can occur any where in array. Hence, it's totally wrong.

schrodinger
  • 247
  • 1
  • 2
  • 7