0

It seems strange, but I haven't found anything on the following problem: Given a vector x of length n, how can one construct a vector oo of length n such that x[oo[i]] (for i=1,..,n) is sorted. Requirements: only C functions in the standard library are allowed to be used and the code has to be fast (note: I'm not a C programmer but experienced in R. R has order() for this task).

I found the post here but this discusses sorting directly.

Community
  • 1
  • 1
Marius Hofert
  • 6,546
  • 10
  • 48
  • 102
  • The answer is in the link you posted. – CroCo Mar 01 '15 at 06:34
  • 2
    @CroCo: no, the answer is not trivially in the link that was posted. To be able to use `qsort()`, the vector `x` must be available as a global variable that can be accessed from the comparator passed to `qsort()`, which limits the flexibility of any solution. – Jonathan Leffler Mar 01 '15 at 06:48
  • Note that C indexes arrays of size N from 0 to N-1, rather than from 1 to N. Which indexing scheme is needed — 0-based (normal) C arrays or 1-based (unusual) arrays. – Jonathan Leffler Mar 01 '15 at 06:49
  • @JonathanLeffler: Thanks for helping, Jonathan. 0-based is good. – Marius Hofert Mar 01 '15 at 06:54
  • @JonathanLeffler qsort doesn't have any global variable requirements. You pass it pointers to `void` which may point to a struct containing any other info required to perform the sort. – M.M Mar 01 '15 at 07:39
  • @MattMcNabb: in general, `qsort()` does not need global variables, but sometimes, as in this case, the comparator function may need access to data that is not passed to `qsort()`. If you know how to sort an array of integers so that a parallel array of doubles can be accessed in sorted order without a global variable, please show how to do it in your own answer. You can use the `main()` function from my code, and the printing code, as the basis for the code you use to show how it works without either a global variable or the `thunk` used with `qsort_r()`. – Jonathan Leffler Mar 01 '15 at 07:52

3 Answers3

3

The question you link to (C library function to do sort) shows how to use the standard C library function called qsort() in general, but your requirement is not one of the usual problems. To be able to sort the oo array, the comparator function must be able access the x array as well as the data passed to it from qsort() itself.

This code achieves that with reasonable economy of effort:

#include <stdio.h>
#include <stdlib.h>

typedef double VecType;
#define PRIf_VecType "f"

static VecType *base;

static int compare(const void *p1, const void *p2)
{
    const int i1 = *(int *)p1;
    const int i2 = *(int *)p2;
    if (base[i1] < base[i2])
        return -1;
    else if (base[i1] > base[i2])
        return +1;
    else
        return 0;
}

static void print_arrays(const char *tag, size_t x_size, VecType *x, int *oo)
{
    printf("%s:\n", tag);
    for (size_t i = 0; i < x_size; i++)
        printf("%zu: oo[%zu] = %d, x[oo[%zu]] = %4.2" PRIf_VecType
               ", x[%zu] = %4.2" PRIf_VecType "\n",
               i, i, oo[i], i, x[oo[i]], i, x[i]);
}

int main(void)
{
    VecType x[] = { 3.45, 1.23, 9.14, 4.67, 2.19, 3.45, 5.92 };
    size_t x_size = sizeof(x) / sizeof(x[0]);
    int oo[x_size];

    for (size_t i = 0; i < x_size; i++)
        oo[i] = (int)i;

    print_arrays("Before", x_size, x, oo);
    base = x;
    qsort(oo, x_size, sizeof(oo[0]), compare);
    print_arrays("After", x_size, x, oo);

    return 0;
}

Sample output:

Before:
0: oo[0] = 0, x[oo[0]] = 3.45, x[0] = 3.45
1: oo[1] = 1, x[oo[1]] = 1.23, x[1] = 1.23
2: oo[2] = 2, x[oo[2]] = 9.14, x[2] = 9.14
3: oo[3] = 3, x[oo[3]] = 4.67, x[3] = 4.67
4: oo[4] = 4, x[oo[4]] = 2.19, x[4] = 2.19
5: oo[5] = 5, x[oo[5]] = 3.45, x[5] = 3.45
6: oo[6] = 6, x[oo[6]] = 5.92, x[6] = 5.92
After:
0: oo[0] = 1, x[oo[0]] = 1.23, x[0] = 3.45
1: oo[1] = 4, x[oo[1]] = 2.19, x[1] = 1.23
2: oo[2] = 5, x[oo[2]] = 3.45, x[2] = 9.14
3: oo[3] = 0, x[oo[3]] = 3.45, x[3] = 4.67
4: oo[4] = 3, x[oo[4]] = 4.67, x[4] = 2.19
5: oo[5] = 6, x[oo[5]] = 5.92, x[5] = 3.45
6: oo[6] = 2, x[oo[6]] = 9.14, x[6] = 5.92

The printing for 'after' assures us that the array x is unchanged, but the array oo has been updated such that x[oo[i]] is in the ith position in sorted order.

BSD (and therefore Mac OS X too) provides a non-standard alternative to qsort(), namely qsort_r():

void qsort_r(void *base, size_t nel, size_t width, void *thunk, int (*compar)(void *, const void *, const void *));

The qsort_r() function behaves identically to qsort(), except that it takes an additional argument, thunk, which is passed unchanged as the first argument to function pointed to compar. This allows the comparison function to access additional data without using global variables, and thus qsort_r() is suitable for use in functions which must be reentrant.

Writing the code in terms of qsort_r() is a rather trivial set of changes:

#include <stdio.h>
#include <stdlib.h>

typedef double VecType;
#define PRIf_VecType "f"

static int compare(void *thunk, const void *p1, const void *p2)
{
    const VecType *base = (VecType *)thunk;
    const int i1 = *(int *)p1;
    const int i2 = *(int *)p2;
    if (base[i1] < base[i2])
        return -1;
    else if (base[i1] > base[i2])
        return +1;
    else
        return 0;
}

static void print_arrays(const char *tag, size_t x_size, VecType *x, int *oo)
{
    printf("%s:\n", tag);
    for (size_t i = 0; i < x_size; i++)
        printf("%zu: oo[%zu] = %d, x[oo[%zu]] = %4.2" PRIf_VecType
               ", x[%zu] = %4.2" PRIf_VecType "\n",
               i, i, oo[i], i, x[oo[i]], i, x[i]);
}

int main(void)
{
    VecType x[] = { 3.45, 1.23, 9.14, 4.67, 2.19, 3.45, 5.92 };
    size_t x_size = sizeof(x) / sizeof(x[0]);
    int oo[x_size];

    for (size_t i = 0; i < x_size; i++)
        oo[i] = (int)i;

    print_arrays("Before", x_size, x, oo);
    qsort_r(oo, x_size, sizeof(oo[0]), x, compare);
    print_arrays("After", x_size, x, oo);

    return 0;
}

With sample output like this (it is the same as the output from the other code):

Before:
0: oo[0] = 0, x[oo[0]] = 3.45, x[0] = 3.45
1: oo[1] = 1, x[oo[1]] = 1.23, x[1] = 1.23
2: oo[2] = 2, x[oo[2]] = 9.14, x[2] = 9.14
3: oo[3] = 3, x[oo[3]] = 4.67, x[3] = 4.67
4: oo[4] = 4, x[oo[4]] = 2.19, x[4] = 2.19
5: oo[5] = 5, x[oo[5]] = 3.45, x[5] = 3.45
6: oo[6] = 6, x[oo[6]] = 5.92, x[6] = 5.92
After:
0: oo[0] = 1, x[oo[0]] = 1.23, x[0] = 3.45
1: oo[1] = 4, x[oo[1]] = 2.19, x[1] = 1.23
2: oo[2] = 5, x[oo[2]] = 3.45, x[2] = 9.14
3: oo[3] = 0, x[oo[3]] = 3.45, x[3] = 4.67
4: oo[4] = 3, x[oo[4]] = 4.67, x[4] = 2.19
5: oo[5] = 6, x[oo[5]] = 5.92, x[5] = 3.45
6: oo[6] = 2, x[oo[6]] = 9.14, x[6] = 5.92
Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
2

The standard library's all-purpose sorting function is qsort. It takes a comparison function which takes the two elements to be compared. There is no way to pass additional information to the function.

If you have access to a sorting function that accepts additional data in its comparison function, e.g. qsort_r or g_qsort_with_data you should use that. Jonathan Leffler has shown you how. (It's a pity that qsort_r isn't part of the standard. The additional data is often useful and lends itself to a clean solution of your problem.)

If you rely on qsort, a simple solution would be to store the additional information in a global variable. That's not a nice solution, because it doesn't encapsulate the data properly and isn't thread-safe. For a small-scale program, it might be good enough, though; see Rob's answer.

Another approach, which is often used in sorting two arrays alongside each other is to combine them into structs of these variables, and sort according to one of the fields. That's usually a good approach if the data belongs together, but in your case that would mean to create an auxiliary struct array.

Finally, you could create an array of pointers and sort those with one level of indirection.

Edit: I had first proposed to use this approach to create an array of indices as requested in the question. That solution relied on size_t being of the same size as void *, which isn't necessarily true by the C standard, and also accessed the same memory as both pointers into the array and array indices, thus breaking strict aliasing rules. That solution is still available at the end of the post.

I've now come to the conclusion that the pointer solution is viable, but that the order function should populate an array of pointers into the array. So instead of accessing the element via an index, x[oo[i]], it is now accessed via a pointer *oref[i]. If the index is needed, it can be obtained via pointer arithmetic:

ix = oref[i] - x;

This is a nice C solution. Here's an implementation with example client code:

#include <stdlib.h>
#include <stdio.h>

typedef int Type;

int ptrcmp(const void *a, const void *b)
{
    const Type *const *aa = a;
    const Type *const *bb = b;

    return (**aa > **bb) - (**aa < **bb);
}

void order_ref(Type **ptr, Type *arr, size_t n)
{
    size_t i;

    for (i = 0; i < n; i++) ptr[i] = arr + i;    
    qsort(ptr, n, sizeof(*ptr), ptrcmp);    
}

#define countof(x) (sizeof(x) / sizeof(*x))

int main()
{
    Type arr[] = {8, 5, 4, 9, 1, 7, 6, 3, 2, 0};
    Type *ptr[countof(arr)];
    size_t n = countof(arr);
    size_t i;

    order_ref(ptr, arr, n);

    for (i = 0; i < n; i++) {
        int ix = ptr[i] - arr;

        printf("%4d%16d\n", ix, *ptr[i]);
    }

    return 0;
}

The code I originally proposed is below. I said: You can make use of the fact that size_t has the same size as a pointer and convert the pointers to unsigned integers of type size_t using pointer arithmetic. That condition isn't necessarily true on all platforms, but it is at least enforced by an assert.

Using the same array for pointers and indices is a trick that saves allocating an auxiliary array. I'm not sure whether is breaks strict aliasing, because it does the type punning on on element of the array at a time, but it certainly isn't a clean solution.

It still might be useful, so here's the code, but really prefer the qsort_r or pointer solutions.

typedef int Type;    // Source data type 

int ptrcmp(const void *a, const void *b)
{
    const Type *const *aa = a;
    const Type *const *bb = b;

    return (**aa > **bb) - (**aa < **bb);
}

size_t *order(const Type *arr, size_t n)
{
    const Type **ptr = malloc(n * sizeof(*ptr));
    size_t *res = (size_t *) ptr;
    size_t i;

    assert(sizeof(size_t) == sizeof(Type *));

    for (i = 0; i < n; i++) ptr[i] = arr + i;    
    qsort(ptr, n, sizeof(*ptr), ptrcmp);    
    for (i = 0; i < n; i++) res[i] = ptr[i] - arr;

    return res;
}

/*
 *      Example client code
 */
int main()
{
    Type arr[] = {8, 5, 4, 9, 1, 7, 6, 3, 2, 0};
    size_t n = sizeof(arr) / sizeof(*arr);

    size_t *ind = order(arr, n);
    size_t i;

    for (i = 0; i < n; i++) {
        printf("%4d%16d\n", ind[i], arr[ind[i]]);
    }

    free(ind);

    return 0;
}
Community
  • 1
  • 1
M Oehm
  • 28,726
  • 3
  • 31
  • 42
  • The code can be modified so that instead of `order()` allocating the space, the `main()` function allocates an array `size_t oo[n];` which is passed to `order()` for use/abuse. I suspect it runs afoul of 'strict aliassing' rules, using the same array as containing elements of type `Type *` and `size_t` inside a single function. I'm not certain that the assertion `assert(sizeof(size_t) == sizeof(Type *))` is obliged to be safe, but it will hold on the majority of systems. – Jonathan Leffler Mar 01 '15 at 08:32
  • @JonathanLeffler: Thanks for your comment and editing. I agree that a more C-like approach is to pass the array to be filled. After all, the number of elements is known when calling it and the user can chose whether to allocate it on the stack or heap depending of the situation. I thought that `size_t` and `void *` should have the same saize, but [I was mistaken](http://stackoverflow.com/questions/2550774/what-is-size-t-in-c). – M Oehm Mar 01 '15 at 08:42
  • I'm aware of the strict aliasing problem and really should have pointed that out. A cleaner and C-ish approach would be to have `order` fill a sorted array of pointers. I'll revise the answer, but I've got to attend to my Sunday laundry chores first. – M Oehm Mar 01 '15 at 08:45
0

I won't comment on whether it is fast, but this does it in a manner reasonably economical with code. Note the code is not reentrant nor is it thread safe, due to using a static to pass information between two functions.

This code assumes both the arrays x and oo are of length size.

#include <stdlib.h>
const WhateverType  *array;

int our_comparison_thing(const void *a, const void *b)
{
     WhateverType *aval = array + *(size_t *)a;
     WhateverType *bval = array + *(size_t *)b;
     return (*aval == *bval) ? 0 : ((*aval < *bval) ? -1 : 1);
}

void DoOurThing(const WhateverType *x, size_t *oo, size_t size)
{
      size_t i;
      array = x;
      for (i = 0; i < size; ++i)
          oo[i] = i;
      qsort((void *)oo, size, sizeof(*oo), our_comparison_thing);
}
Rob
  • 1,966
  • 9
  • 13