0

Do you know a way to use a sorting algorithm that uses vectors intrinsics efficiently ?

I have to use the capability of loading, storing 4 floats at one operation and also other vectors operations.

I found this code for "Quick Sort". Can you help me understand how to implement it with SIMD ?

int partition(float *arr, int low, int high)
{
    float pivot;
    int i, j;

    // pivot (Element to be placed at right position)
    pivot = arr[high];

    i = (low - 1);  // Index of smaller element and indicates the 
                    // right position of pivot found so far

    for (j = low; j <= high - 1; j++) {

        // If current element is smaller than the pivot
        if (arr[j] < pivot) {
            i++;    // increment index of smaller element
            swap(&arr[i], &arr[j]);
            
        }
    }

    swap(&arr[i + 1], &arr[high]);
    return (i + 1);
}

/* low  –> Starting index,  high  –> Ending index */
void quickSort(float *arr, int low, int high) 
{
    int pi;

    if (low < high) {

        /* pi is partitioning index, arr[pi] is now at right place */

        pi = partition(arr, low, high);

        quickSort(arr, low, pi-1);  // Before pi

        quickSort(arr, pi + 1, high); // After pi

    }

}
Zvi Vered
  • 459
  • 2
  • 8
  • 16
  • See [Fast merge of sorted subsets of 4K floating-point numbers in L1/L2](https://stackoverflow.com/a/11600113). Also see [Fast in-register sort of bytes?](https://stackoverflow.com/q/1580686) (some of the answers are actually about sorting floats in a SIMD register, not separate bytes). And [Merge sort on simd register](https://stackoverflow.com/q/49578152) for some links. – Peter Cordes Jun 17 '22 at 12:12
  • [Sorting 64-bit structs using AVX?](https://stackoverflow.com/q/31486942) has some links to Furtak's paper and other related work. (Not an exact duplicate; I was hasty in closing, didn't notice immediately that the first question I picked was just about merging, not full sorting, but one of the answers does link known SIMD-sort stuff. But I think my answer on the 64-bit struct Q&A says enough about sorting just float elements to be useful. I don't know about ARM, whether it has a shuffle that can easily do what x86 shufps does (2 input regs), but it has maxps and minps equivalents.) – Peter Cordes Jun 17 '22 at 12:20
  • Hi Peter. Thank you very much. I looked at the code with the title: "Here is the final code:" in https://stackoverflow.com/questions/31486942/sorting-64-bit-structs-using-avx It is not clear what does 'final code". It gets only one vector register. – Zvi Vered Jun 17 '22 at 18:32
  • The phrase "Here is the final code" is in someone else's answer, not mine. I was talking about links like https://webdocs.cs.ualberta.ca/~amaral/papers/furtak-spaa07.pdf (Furtak's SSE float-sort paper), and http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/xiaochen_rocki_IA3_SC13.pdf Those are the basis for what my answer there proposed; a way to make a SIMD comparator that you can use with those sorting algorithms which sort number+payload based on just comparing the number. – Peter Cordes Jun 17 '22 at 21:11
  • I googled on Furtak Quicksort and found https://arxiv.org/pdf/1704.08579.pdf which shows an AVX-512 version with actual code listings in the paper. Unfortunately Furtak's original paper doesn't have that; I'd assume the code is available somewhere but I didn't find it on the first page of results for `furtak quicksort`, just papers (copies of it or citing it.) – Peter Cordes Jun 18 '22 at 10:56

0 Answers0