4

Is there a generic way to remove duplicate pointers from an array, e.g for an input like this:

enter image description here

return this:

enter image description here

without requiring any knowledge of the types of the things that the pointers are pointing at?

Arun
  • 19,750
  • 10
  • 51
  • 60
bph
  • 10,728
  • 15
  • 60
  • 135
  • 1
    Very nice question and illustration. – Utkan Gezer Apr 04 '14 at 15:06
  • 1
    You only want to remove pointers and you don't care about the data pointed to. Therefore it's just like removing duplicate integers from an array of integers. – Jabberwocky Apr 04 '14 at 15:09
  • 2
    since you can compare pointers with simple == without regard to what they point at, this seems fairly routine... what have you tried and where are you stuck? – RobP Apr 04 '14 at 15:09
  • BTW +1 for the nice drawings (and only for the drawings). – Jabberwocky Apr 04 '14 at 15:11
  • that link says "When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object or incomplete types both point to the same object, or both point one past the last element of the same array object, they compare equal." – RobP Apr 04 '14 at 15:14

2 Answers2

4

I would do the following:

  • sort the array of pointers — O(n lg n)
  • go through the array, copying out first occurrences to another array — O(n)

Keeping track of whether a pointer is a duplicate is extremely easy if the array is sorted: you just save the last one copied and don't copy the next one across until it's different.

Another simple brute-force approach (if you need to preserve the order of the pointers in the original array) would be to search forward in the array replacing duplicates with NULL. This has O(n^2) complexity, but you may not care for modest array sizes. For example:

k=0
for(i=0; i<n; i++) {
   if( myarray[i] == NULL ) {
       continue;
   }
   current = myarray[i];
   unique[k] = current;
   k++;
   for(j=i+1; j<n; j++) {
       if( current == myarray[j] ) {
           myarray[j] = NULL;
       }
   }
}
Emmet
  • 6,192
  • 26
  • 39
  • That won't work if the order needs to be preserved as it is suggested by the drawing. – Jabberwocky Apr 04 '14 at 15:12
  • this was my original thinking too - just got bogged in what you can and can't assume when comparing ptrs (http://stackoverflow.com/questions/22863406/sorting-an-array-of-pointers) - thought there may be some other approach i hadn't considered – bph Apr 04 '14 at 15:20
  • There are a number of other approaches, but I think this is the simplest if you can live with the pointers being sorted in the result and don't need to preserve their original order. – Emmet Apr 04 '14 at 15:26
  • I added a brute-force O(n^2) solution for the case where you need to preserve order. If your array of pointers is of reasonably modest size, you might decide to live with it. It has the benefit of simplicity. – Emmet Apr 04 '14 at 15:40
  • It's easy enough to create a secondary array that contains a structure with the pointer and its original order in the array. Sort the secondary array by pointer and then index, remove duplicates, and then sort the secondary array by index and extract the pointers in order. – Jim Mischel Apr 04 '14 at 18:13
1

Using a hash table helps, here is a solution with expected case O(N) [not worst case].

H = Empty Hash Table

foreach ptr in Array:
    if ptr NOT present in H:
        insert ptr to H

// H now have every ptr only once

wi = 1                    // write index
for ri = 1..n:            // read index
    ptr = Array[ ri ]
    if ptr in H:
        Array[ wi ] = ptr
        remove ptr from H
        wi++
Arun
  • 19,750
  • 10
  • 51
  • 60
  • yes thats an option - but no hash batteries included with c unfortunately - i do use uthash already though so it is an option - i think the sort and step is the way forward though - just being very careful with the ptr_cmp() you pass to qsort as per unwinds findings – bph Apr 04 '14 at 15:44
  • 1
    I understand, however, hash table is a too good a data structure to pass on :-). K&R C have a small basic implementation. Does this help? http://linux.die.net/man/3/hsearch – Arun Apr 04 '14 at 15:55
  • for sure - i agree, i use them extensively (http://troydhanson.github.io/uthash) - its always a little convoluted in c though compared to langs where they are baked into the syntax (e.g. pretty much every other language) – bph Apr 04 '14 at 15:59
  • I agree that a hash table is the canonical solution, but most library implementations (including the POSIX `hcreate()`/`hsearch()`/`hdestroy()` family) assume that the key is a string. It's arguable whether it's worth the additional effort and complexity over obvious O(n lg n) or even O(n^2) solutions unless the array is fairly big. – Emmet Apr 04 '14 at 18:37
  • @Emmet: The approach to any algorithm problem is to find a solution that works for large instances of the problem, i.e. a solution which scales well with input size. All the stuff about Big-O analysis, you know :-) If we restrict the discussion only to small arrays, then there is no need to search for algorithms, a **brute force `O(N^2)`**, as you included in your answer, does the job pretty well. In fact that might be preferable over the sorting based solution since it maintains the requirement of preserving the relative order of the pointers. – Arun Apr 04 '14 at 18:49
  • Yes, but the approach to any practical problem is to find a solution that balances economy of implementation, ease of understanding, performance, and takes problem size into account. This may not be the “best” solution in an asymptotic sense. Would you implement Strassen's algorithm for small matrices? – Emmet Apr 04 '14 at 18:58
  • @Emmet: (1) I was not sure if the "problem size" was explicitly mentioned. If question only pertains to "small" "problem size", then I concur, there is no point in dealing with hash table, just a brute force solution is good enough. [I would still honor the order preserving requirement.] (2) The issue here, IMO apparently, is that the C standard library does not offer a nice hash table as does other languages like C++, python etc. – Arun Apr 04 '14 at 19:30
  • Agreed. I think the *C* standards committee would probably argue that it would go against the minimalist philosophy of *C* to add hash tables. IMHO that would be a pretty weak argument considering some of the other things that are in there. – Emmet Apr 04 '14 at 19:38