I just couldn't help myself, so I wrote up an answer you may enjoy. The following code will return an array of unique values in ascending order for an input array of unsorted integers. Note that the output results are the actual values, not just the indices.
program unique_sort
implicit none
integer :: i = 0, min_val, max_val
integer, dimension(10) :: val, unique
integer, dimension(:), allocatable :: final
val = [ 3,2,5,7,3,1,4,7,3,3 ]
min_val = minval(val)-1
max_val = maxval(val)
do while (min_val<max_val)
i = i+1
min_val = minval(val, mask=val>min_val)
unique(i) = min_val
enddo
allocate(final(i), source=unique(1:i)) !<-- Or, just use unique(1:i)
print "(10i5:)", final
end program unique_sort
! output: 1 2 3 4 5 7
See this gist for timing comparisons between (unique_sort
) above, your example (unique_indices
), and the example at Rosetta Code (remove_dups
) as well as a couple of variations. I'd like to test @High Performance Mark's code but haven't yet.
Run program 1,000,000 times, 100 integers 0<=N<=50
- unique_sort t~2.1 sec input: unsorted, w/duplicates output: sorted unique values
- remove_dup t~1.4 input: unsorted, w/duplicates output: unsorted unique values
- unique_indices t~1.0 input: sorted, w/duplicates output: unsorted indices for unique values
- BONUS!(Python) t~4.1 input: unsorted, w/duplicates output: sorted unique values
Bottom line: on my machine (i7 8GB laptop) unique_indices
is slightly faster than remove_dups
. However, remove_dups
does not require the input array to be pre-sorted, and actually returns the values rather than the indices (see the gist for a modified version of unique_indices
that returns the values instead, which doesn't seem to slow it down much at all).
On the other hand, unique_sort
takes around twice as long, but is designed to handle unsorted input, and also returns the values in sorted order, in 8 LOC (minus the var declarations). So that seems a fair trade-off. Anywho, I'm sure unique_sort
can be optimized for greater speed using some sort of masking statement, but that's for another day.
Update
The timings shown above were obtained from a test program where each subroutine was placed in a module and executed via a procedure call. However, I found a surprisingly large improvement in performance when unique_sort
was placed directly in the main program, completing in only ~0.08 sec for 1 million runs. A speedup of ~25x simply by not using a procedure seems strange to me - ordinarily, I assume that the compiler optimizes the cost of procedure calls away. For example, I found no difference in performance for remove_dup
or unique_indices
whether they were executed via a procedure or placed directly in the main program.