2

I have read sources that say that the time complexities for Selection sort are:

  • Best-case: O(n^2)
  • Average-case: O(n^2)
  • Worst-case: O(n^2)

I was wondering if it is worth it to "optimize" the algorithm by adding a certain line of code to make the algorithm "short-circuit" itself if the remaining part is already sorted.

Here's the code written in C:

I have also added a comment which indicates which lines are part of the "optimization" part.

void printList(int* num, int numElements) {
    int i;  

    for (i = 0; i < numElements; i ++) {
        printf("%d ", *(num + i));
    }
    printf("\n");
}

int main() {
    int numElements = 0, i = 0, j = 0, min = 0, swap = 0, numSorted = 0;

    printf("Enter number of elements: ");
    scanf("%d", &numElements);

    int* num = malloc(sizeof(int) * numElements);

    for (i = 0; i < numElements; i ++) {
        printf("Enter number = ");
        scanf(" %d", num + i);
    }

    for (i = 0; i < numElements-1; i++) {
        numSorted = i + 1;  // "optimized"
        min = i;

        for (j = i + 1; j < numElements; j++) {
            numSorted += *(num + j - 1) <= *(num + j);  // "optimized"
            if (*(num + min) > *(num + j))
                min = j;
        }

        if (numSorted == numElements)  // "optimized"
           break;

        if (min != i) {
            swap = *(num + i);
            *(num + i) = *(num + min);
            *(num + min) = swap;
        }

        printList(num, numElements);
    }

    printf("Sorted list:\n");
    printList(num, numElements);

    free(num);

    getch();
    return 0;

}
Stephen Docy
  • 4,738
  • 7
  • 18
  • 31
CudoX
  • 985
  • 2
  • 19
  • 31

5 Answers5

4

Optimizing selection sort is a little silly. It has awful best-case, average, and worst-case time complexity, so if you want a remotely optimized sort you would (almost?) always pick another sort. Even insertion sort tends to be faster and it's hardly much more complicated to implement.

More to the point, checking if the list is sorted increases the time the algorithm takes in the worst case scenarios (the average case too I'm inclined to think). And even a mostly sorted list will not necessarily go any faster this way: consider 1,2,3,4,5,6,7,9,8. Even though the list only needs two elements swapped at the end, the algorithm will not short-circuit as it is not ever sorted until the end.

Chris
  • 1,613
  • 17
  • 24
  • I do get your point but the "optimization" (if it is) that I am curious about is intended to "short-circuit" the sorting iteration. Let's consider the example 1, 2, 3, 4, 5. In a regular selection sort, a step by step would do 1, 2, 3, 4, 5 - 1, 2, 3, 4, 5 - 1, 2, 3, 4, 5 - 1, 2, 3, 4, 5. In the code I have presented, it would simply go 1, 2, 3, 4, 5 - break. – CudoX Feb 26 '16 at 04:34
  • 1
    Right- but how often will this actually come up? It's extra overhead on an already very slow algorithm, and only an advantage in the extremely-rare case that the list is already sorted. If your use case is somewhere where the list is often already sorted or almost sorted, use a sorting algorithm like insertion sort that takes care of this naturally (and as a bonus, is also good at sorting almost-sorted lists). – Chris Feb 26 '16 at 05:02
  • I'm not sure why you say *even* Insertion sort. Insertion sort is the gold standard of comparison sorts, so yes, it has better best-case asymptotic complexity than selection sort, and it tends to do better in the average case, though it is still O(n^2). – John Bollinger Aug 27 '18 at 21:51
2

Just because something can be optimized, doesn't necessarily mean it should. Assuming profiling or "boss-says-so" indicates optimization is warranted there are a few things you can do.

As with any algorithm involving iteration over memory, anything that reduces the number of iterations can help.

  • keep track of the min AND max values - cut number of iterations in half
  • keep track of multiple min/max values (4 each will be 1/8th the iterations)
    • at some point temp values will not fit in registers
    • the code will get more complex

It can also help to maximize cache locality.

  • do a backward iteration after the forward iteration
    • the recently accessed memory should still be cached
    • going straight to another forward iteration would cause a cache miss
    • since you are moving backward, the cache predictor may prefetch the rest
    • this could actually be worse on some architectures (RISC-V)
  • operate on a cache line at a time where possible
    • this can allow the next cache line to be prefetched in the mean time
    • you may need to align the data or specially handle the first and last data
      • even with increased alignment, the last few elements may need "padding"

Use SIMD instructions and registers where useful and practical

  • useful for non-branching rank order sort of temps
  • can hold many data points simultaneously (AVX-512 can do a cache line)
  • avoids memory access (thus less cache misses)

If you use multiple max/min values, optimize sorting the n values of max and min

  • see here for techniques to sort a small fixed number of values.
  • save memory swaps until the end of each iteration and do them once
    • keep temporaries (or pointers) in registers in the mean time

There are quite a few more optimization methods available, but eventually the resemblance to selection sort starts to get foggy. Each of these is going to increase complexity and therefore maintenance cost to the point where a simpler implementation of a more appropriate algorithm may be a better choice.

technosaurus
  • 7,676
  • 1
  • 30
  • 52
1

The only way I see how this can be answered is if you define the purpose of why you are optimizing it.

  • Is it worth it in a professional setting: on the job, for code running "in production" - most likely (even almost certainly) not.
  • Is it worth it as a teaching/learning tool - sometimes yes.

I teach programming to individuals and sometimes I teach them algorithms and datastructures. I consider selection sort to be one of the easiest to explain and teach - it flows so naturally after explaining the algorithm for finding the minimum and swapping two values (swap()). Then, at the end I introduce the concept of optimization where we can implement this counter "already sorted" detection.

Admittedly bubble sort is even better to introduce optimization, because it has at least 3 easy to explain and substantial optimizations.

Mindaugas Bernatavičius
  • 3,757
  • 4
  • 31
  • 58
1

I was wondering if it is worth it to "optimize" the algorithm by adding a certain line of code to make the algorithm "short-circuit" itself if the remaining part is already sorted.

Clearly this change reduces the best-case complexity from O(n2) to O(n). This will be observed for inputs that are already sorted except for O(1) leading elements. If such inputs are a likely case, then the suggested code change might indeed yield an observable and worthwhile performance improvement.

Note, however, that your change more than doubles the work performed in the innermost loop, and consider that for uniform random inputs, the expected number of outer-loop iterations saved is 1. Consider also that any outer-loop iterations you do manage to trim off will be the ones that otherwise would do the least work. Overall, then, although you do not change the asymptotic complexity, the actual performance in the average and worst cases will be noticeably worse -- runtimes on the order of twice as long.

If you're after better speed then your best bet is to choose a different sorting algorithm. Among the comparison sorts, Insertion Sort will perform about the same as your optimized Selection Sort on the latter's best case, but it has a wider range of best-case scenarios, and will usually outperform (regular) Selection Sort in the average case. How the two compare in the worst case depends on implementation.

If you want better performance still then consider Merge Sort or Quick Sort, both of which are pretty simple to implement. Or if your data are suited to it then Counting Sort is pretty hard to beat.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
0

we can optimize selection sort in best case which will be O(n) instead of O(n^2). here is my optimization code.

public class SelectionSort {
    static void selectionSort(int[]arr){
        for(int i=0; i< arr.length; i++){
            int maxValue=arr[0];
            int maxIndex=0;
            int cnt=1;
            for (int j=1; j< arr.length-i; j++){
                if(maxValue<=arr[j]){
                    maxValue=arr[j];
                    maxIndex=j;
                    cnt++;
                }
            }
            if(cnt==arr.length)break;
            arr[maxIndex]=arr[arr.length-1-i];
            arr[arr.length-1-i]=maxValue;
        }
    }

    public static void main(String[] args) {
        int[]arr={1,-3, 0, 8, -45};
        selectionSort(arr);
        System.out.println(Arrays.toString(arr));
    }
}
greybeard
  • 2,249
  • 8
  • 30
  • 66