4

I have found many questions that turn around this issue, but none that directly answer the question:

-in fortran, what are (a) the fastest (wall clock) and (b) the most elegant (concise and clear) way to eliminate duplicates from a list of integers

There has to be a better way than my feeble attempt:

Program unique
 implicit none
 !   find "indices", the list of unique numbers in "list"
 integer( kind = 4 ) :: kx, list(10)
 integer( kind = 4 ),allocatable :: indices(:)
 logical :: mask(10)
 !!$    list=(/3,2,5,7,3,1,4,7,3,3/)
 list=(/1,(kx,kx=1,9)/)
 mask(1)=.true.
 do kx=10,2,-1
   mask(kx)= .not.(any(list(:kx-1)==list(kx)))
 end do
 indices=pack([(kx,kx=1,10)],mask)
 print *,indices
End Program unique

My attempt expects the list to be ordered, but it would be better if that requirement were lifted

Clinton Winant
  • 704
  • 10
  • 19
  • Have you looked at [Rosetta Code](https://rosettacode.org/wiki/Remove_duplicate_elements#Fortran)? – Arya McCarthy May 26 '17 at 09:39
  • There is something in https://stackoverflow.com/questions/14137610/remove-repeated-elements-on-an-2d-array-in-fortran-90 and https://stackoverflow.com/questions/12147864/finding-duplicate-records-in-fortran – Vladimir F Героям слава May 26 '17 at 09:51
  • @AryaMcCarthy The Rosetta Code does find dupes, but I wouldn't call it fast or elegant: It involves nested do loops, and takes twice as long as my example for the same sied array... – Clinton Winant May 26 '17 at 12:42
  • @VladimirF The first question [https://stackoverflow.com/questions/14137610/remove-repeated-elements-on-an-2d-array-in-fortran-90] is essentially the same as my own example, except that it throws an error when compiled with gfortran. The second [https://stackoverflow.com/questions/12147864/finding-duplicate-records-in-fortran], at over 50 lines of code is never going to be called elegant. It seems to me that my question boils down to "can the do loop in my example be vectorized (in the non SIMD meaning)" – Clinton Winant May 26 '17 at 12:47
  • Note that your own example does more comparisons that is necessary. But I don't think you can write it more succinctly (certainly I can't see what vectorization you might have in mind). There is no intrinsic if you ask. You can write your own function using any approach you want and that the call to that function will be short. – Vladimir F Героям слава May 26 '17 at 12:55
  • What error does the code in the first link throw? You are normally expected to fix small problems yourself, many answers are *not* tested by running the code. Note that Mark's answer in the link does not require an ordered array. – Vladimir F Героям слава May 26 '17 at 12:58
  • FWIW I think that the most elegant way, and a fast one too though perhaps not the fastest, is to sort the array first, then scan through it dropping duplicates as you go. Here's how elegant the usage of this approach looks ... `outarr = dedupe(sort(inarr))`. Unlike the languages used by script-kiddies and code-monkeys noble old Fortran leaves you to write your own `sort` routine but surely you've already done that ? Writing the code to deduplicate a sorted array is, well, not challenging anyhoo. – High Performance Mark May 26 '17 at 13:15
  • @VladimirF The error is: ALLOCATE(index_vector, source=PACK([(ix, ix=1,numcols) ],mask)) 1 Error: Array specification required in ALLOCATE statement at (1) . Judging the size of an error must depend on the eye of the beholder? – Clinton Winant May 26 '17 at 13:50
  • @HighPerformanceMark Your comment about outarr=... reminds of the person who claimed to have a one line CFD code: call CFD. Seriously, the culling task (dedupe?), while perhaps not challenging to you, is one that is part of literally millions of fortran codes. – Clinton Winant May 26 '17 at 13:59
  • I put a comment in the original question about that error. You can easily workaround that, it is a gfortran issue, the code itself is correct. – Vladimir F Героям слава May 26 '17 at 14:02

2 Answers2

4

I just couldn't help myself, so I wrote up an answer you may enjoy. The following code will return an array of unique values in ascending order for an input array of unsorted integers. Note that the output results are the actual values, not just the indices.

program unique_sort
    implicit none
    integer :: i = 0, min_val, max_val
    integer, dimension(10) :: val, unique
    integer, dimension(:), allocatable :: final

    val = [ 3,2,5,7,3,1,4,7,3,3 ]
    min_val = minval(val)-1
    max_val = maxval(val)
    do while (min_val<max_val)
        i = i+1
        min_val = minval(val, mask=val>min_val)
        unique(i) = min_val
    enddo
    allocate(final(i), source=unique(1:i))   !<-- Or, just use unique(1:i) 
    print "(10i5:)", final
end program unique_sort
! output:    1    2    3    4    5    7

See this gist for timing comparisons between (unique_sort) above, your example (unique_indices), and the example at Rosetta Code (remove_dups) as well as a couple of variations. I'd like to test @High Performance Mark's code but haven't yet.

Run program 1,000,000 times, 100 integers 0<=N<=50
- unique_sort     t~2.1 sec  input: unsorted, w/duplicates  output: sorted unique values
- remove_dup      t~1.4      input: unsorted, w/duplicates  output: unsorted unique values
- unique_indices  t~1.0      input: sorted, w/duplicates    output: unsorted indices for unique values
- BONUS!(Python)  t~4.1      input: unsorted, w/duplicates  output: sorted unique values

Bottom line: on my machine (i7 8GB laptop) unique_indices is slightly faster than remove_dups. However, remove_dups does not require the input array to be pre-sorted, and actually returns the values rather than the indices (see the gist for a modified version of unique_indices that returns the values instead, which doesn't seem to slow it down much at all).

On the other hand, unique_sort takes around twice as long, but is designed to handle unsorted input, and also returns the values in sorted order, in 8 LOC (minus the var declarations). So that seems a fair trade-off. Anywho, I'm sure unique_sort can be optimized for greater speed using some sort of masking statement, but that's for another day.


Update The timings shown above were obtained from a test program where each subroutine was placed in a module and executed via a procedure call. However, I found a surprisingly large improvement in performance when unique_sort was placed directly in the main program, completing in only ~0.08 sec for 1 million runs. A speedup of ~25x simply by not using a procedure seems strange to me - ordinarily, I assume that the compiler optimizes the cost of procedure calls away. For example, I found no difference in performance for remove_dup or unique_indices whether they were executed via a procedure or placed directly in the main program.

Matt P
  • 2,287
  • 1
  • 11
  • 26
2

After @VladimirF pointed out that I was overcomparing, I found I could vectorize my original code (remove the do loop do kx....). I have coupled the "unique" function with a mergesort algorithm loosely based on wikipedia. The guts are contained in module SortUnique

Module SortUnique
contains
  Recursive Subroutine MergeSort(temp, Begin, Finish, list)
    ! 1st 3 arguments are input, 4th is output sorted list
    implicit none
    integer(kind=4),intent(inout) :: Begin,list(:),temp(:)
    integer(kind=4),intent(in) :: Finish
    integer(kind=4) :: Middle
    if (Finish-Begin<2) then    !if run size =1
       return                   !it is sorted
    else
       ! split longer runs into halves
       Middle = (Finish+Begin)/2
       ! recursively sort both halves from list into temp
       call MergeSort(list, Begin, Middle, temp)
       call MergeSort(list, Middle, Finish, temp)
       ! merge sorted runs from temp into list
       call Merge(temp, Begin, Middle, Finish, list)
     endif
  End Subroutine MergeSort

  Subroutine Merge(list, Begin, Middle, Finish, temp)
    implicit none
    integer(kind=4),intent(inout) :: list(:),temp(:)
    integer(kind=4),intent(in) ::Begin,Middle,Finish
    integer(kind=4)    :: kx,ky,kz
    ky=Begin
    kz=Middle
    !! While there are elements in the left or right runs...
    do kx=Begin,Finish-1
       !! If left run head exists and is <= existing right run head.
       if (ky.lt.Middle.and.(kz.ge.Finish.or.list(ky).le.list(kz))) then
          temp(kx)=list(ky)
          ky=ky+1
       else
          temp(kx)=list(kz)
          kz = kz + 1
       end if
    end do

  End Subroutine Merge

  Function Unique(list)
    !! usage sortedlist=Unique(list)
    implicit none
    integer(kind=4) :: strt,fin,N
    integer(kind=4), intent(inout) :: list(:)
    integer(kind=4), allocatable  :: unique(:),work(:)
    logical,allocatable :: mask(:)
    ! sort
    work=list;strt=1;N=size(list);fin=N+1
    call MergeSort(work,strt,fin,list) 
    ! cull duplicate indices
    allocate(mask(N));
    mask=.false.
    mask(1:N-1)=list(1:N-1)==list(2:N)
    unique=pack(list,.not.mask)

  End Function Unique

End Module SortUnique

Program TestUnique
  use SortUnique
  implicit none
  !   find "indices", the list of unique numbers in "list"
  integer (kind=4),allocatable :: list(:),newlist(:)
  integer (kind=4)  :: kx,N=100000 !N  even
  real (kind=4) :: start,finish,myrandom

  allocate(list(N))
  do kx=1,N
     call random_number(myrandom)
     list(kx)=ifix(float(N)/2.*myrandom)
  end do

  call cpu_time(start)

  newlist=unique(list)

  call cpu_time(finish)
  print *,"cull duplicates: ",finish-start
  print *,"size(newlist) ",size(newlist)

End Program TestUnique

At @HighPerformanceMark 's suggestion, the function is simply invoked as newlist=unique(list). The above is certainly not concise, but it seems clear, and it is about 200 times faster than either my original or the other solutions proposed.

Clinton Winant
  • 704
  • 10
  • 19
  • I get `forrtl: severe (157): Program Exception - access violation` when I try to run this program, compiled w ifort 17. – Matt P May 29 '17 at 20:00
  • @MattP I get no complaints from gfortran, and the compiled program runs correctly. I guess I will invoke Vladimir Fs rule stated above, that if it works on your compiler, don't sweat it. Seriously, I don't have any version of ifort – Clinton Winant Jun 01 '17 at 15:11
  • Fair enough. Anyway, I might try to get it to compile w ifort when I get the chance. It might be worth it to get the performance you are observing. – Matt P Jun 01 '17 at 15:33
  • @MattP Please communicate if you find the problem ifort is flagging: other posts suggest that that compiler throws that error when unallocated memory is "touched" I don't know what that word means in this context, but I suspect that the reentrant part of the mergesort routine may be at the root of this. I have checked the answers for a number of cases, but have not found any issues. – Clinton Winant Jun 01 '17 at 15:49
  • OK, I've got this working with `ifort`. The primary issue is simply `strt=0`, which should be `strt=1`. Otherwise, `Merge` fails the first time it is called, at the loop `kx=Begin,Finish-1`. This is because `Begin` takes the value of `strt`, and *0* is smaller than the lower-bound of `list` (which is *1*). – Matt P Jun 01 '17 at 20:25
  • @MattP Thanks so much, I have edited the code to fix the error you found – Clinton Winant Jun 02 '17 at 15:01
  • @ClintonWinant when I try to run your code with gfortran I get: At line 39 of file sortunique_mod.f90 Fortran runtime error: Index '100001' of dimension 1 of array 'list' above upper bound of 100000 – geom Mar 22 '23 at 20:28