What means of this code in C qsort?

Question

void qsort (void *a, size_t n, size_t es, int (*compare)(const void *, const void *)

where a is a start of array address, n is sizeof array, es is sizeof array element.

I read the source code of qsort in C that I can't understand. the code is as follows.

#define SWAPINT(a,es) swaptype = ((char*)a- (char*)0 % sizeof(long) || \
        es % sizeof(long) ? 2: es == sizeof(long)? 0 : 1

I interpret this macro by,

if(((char*)a- (char*)0)% sizeof(long))==1 || es % sizeof(long)==1)
     swaptype = 2;
else if(es== sizeof(long))
     swaptype = 0;
else
     swaptype = 1;

But I don't understand why type conversion is implemented, (char*)a.

And what means of this line?

(char*)a- (char*)0)% sizeof(long)==1

That code seems quite broken. There is at least one syntax error in the macro, a missing parenthesis. Also, (char*) a - (char*) 0 should be a no-op unless there is something I'm missing? As should (char*) 0 % sizeof(long). — jforberg, Oct 03 '16 at 16:27
`(char*)0 % sizeof(long)` doesn't even make sense, because a pointer type is not an arithmetic type. Whatever this is, this is not conforming C. Where did you find this code? And are you sure you copied it correctly? — , Oct 03 '16 at 16:33
`%` beats `-` so `(char*)a- (char*)0 % sizeof(long)` is `(char*)a - ((char*)0 % sizeof(long))`. Certainly `((char*)a- (char*)0) % sizeof(long)` was desired. — chux - Reinstate Monica, Oct 03 '16 at 16:46
The `==1` in your interpretation should be `!=0`. Also there is a missing `)` after `(char*)0` in your `SWAPINT` macro, leading to unbalanced parentheses. The `SWAPINT` macro appears to set `swaptype = 2` if `a` is unaligned (to a `sizeof(long)` byte boundary) or `es` is not a multiple of `sizeof(long)`, set `swaptype = 0` if `a` is aligned and `es` is exactly `sizeof(long)`, or set `swaptype = 1` if `a` aligned and `es` is some integer multiple != 1 of `sizeof(long)`. — Ian Abbott, Oct 03 '16 at 16:54
@IanAbbott Nice, I think that's it. Do you have any clue why they are doing (a - 0), surely a no-op calculation? — jforberg, Oct 03 '16 at 16:59
The `((char*)a - (char*)0)` would convert a to an integer value of type `ptrdiff_t` without an explicit cast. It is not portable, but since it is part of the implementation of the standard library for the platform, it does not have to be portable. — Ian Abbott, Oct 03 '16 at 17:02

score 4 · Accepted Answer · 2016-10-03T17:11:51.367

Wherever you found that code, you probably copied it incorrectly. I found some very similar code in libutil from Canu:

c.swaptype = ((char *)a - (char *)0) % sizeof(long) || \
  es % sizeof(long) ? 2 : es == sizeof(long)? 0 : 1;

This code was likely illegitimally (because the terms of the copyright license are violated) copied from FreeBSD's libc:

//__FBSDID("$FreeBSD: src/lib/libc/stdlib/qsort.c,v 1.12 2002/09/10 02:04:49 wollman Exp $");

So I'm guessing you got it from a *BSD libc implementation. Indeedd FreeBSD's quicksort implementation contains the SWAPINIT macro (not SWAPINT):

#define SWAPINIT(TYPE, a, es) swaptype_ ## TYPE =       \
        ((char *)a - (char *)0) % sizeof(TYPE) ||       \
        es % sizeof(TYPE) ? 2 : es == sizeof(TYPE) ? 0 : 1;

After parsing, you should find that the above code is roughly the same as

condition_one   = ((char *)a - (char *)0) % sizeof(long);
condition_two   = es %  sizeof(long);
condition_three = es == sizeof(long);
c.swaptype = (condition_one || condition_two) ? 2 : condition_three ? 0 : 1;

Note that condition_two, as a condition, is not the same as es % sizeof(long) == 1, but rather es % sizeof(long) != 0. Aside from that, your translation was correct.

The intent of these conditions seems to be as follows:

condition_one is true when a is not long-aligned.
condition_two is true when es is not a multiple of long.
condition_three is true when es is exactly long.

As a result,

swaptype == 2 is when you don't have enough guarantees about the elements to be clever about swapping,
swaptype == 1 is intended for arrays with elements that are aligned along long boundaries (note: but not necessarily aligned as longs!), and
swaptype == 0 is intended for arrays that match the previous description, that also have elements that are also long-sized.

There is explicit type conversion in this case, because a has type void*, for which type arithmetic is undefined. However, also note that ((char *)a - (char *)0) is undefined too:

When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object; the result is the difference of the subscripts of the two array elements.

(C11 draft N1570, section 6.5.6, clause 9 on pages 93 and 94.)

It's not exactly spelled out in C11, but the null pointer is not part of the same array as the object pointed to by a, so the basic rules for pointer arithmetic are violated, so the behaviour is undefined.

score 1 · Answer 2 · answered Oct 03 '16 at 17:09

1

The macros is trying to check for alignment portably in a language, C, which doesn't really allow for such a test. So we subtract the null pointer from our pointer to obtain an integer, then take modulus the size of a long. If the result is zero, the data is long-aligned and we can access as longs. If it is not, we can try some other scheme.

answered Oct 03 '16 at 17:09

Malcolm McLean

6,258
1
17
18

Couldn't C11's `_Alignas` and `_Alignof` be used in this case? – Oct 03 '16 at 17:10
Even if it's shorter, I guess I've understood this answer better. However, what does that integer means ? Is it the beginning adress of the array ? And why does it matter that the array is considered as long ? if it's a byte array for instance ? – Asoub Oct 04 '16 at 06:46
When we subtract two pointers, we get an integer (not "int") which represents the number of places between them. Subtracting NULL from a char * is a no-op, but it tricks the compiler into accepting the pointer as an integer and allowing modulus on it. It's also a bit dicey if the memory space is segmented, but that's another story. – Malcolm McLean Oct 04 '16 at 07:54

score 0 · Answer 3 · answered Oct 03 '16 at 17:42

As remarked in the comments, the macro definition you present does not expand to valid C code because it involves computing (char*)0 % sizeof(long), where the left-hand operand of the % has type char *. That is not an integer type, but both operands of % are required to have integer type.

Additionally, the macro's expansion has unbalanced parentheses. That's not inherently wrong, but it makes that macro tricky to use. Furthermore, even where operator precedence yields a sensible result, usage of parentheses and extra whitespace can aid human interpretation of the code, at no penalty to execution speed, and negligible extra compilation cost.

So, I think the desired macro would be more like this:

#define SWAPINT(a,es) swaptype = (                                  \
    ((((char*)a - (char*)0) % sizeof(long)) || (es % sizeof(long))) \
        ? 2                                                         \
        : ((es == sizeof(long)) ? 0 : 1))                           \
)

I'd consider instead writing the penultimate line as

        : (es != sizeof(long))

to reduce the complexity of the expression at a slight cost to its comprehensibility. In any event, the intent appears to be to set swaptype to:

2 if a is not aligned on an n-byte boundary, where n is the number of bytes in a long, or if es is not an integer multiple of the size of a long; otherwise
1 if es is unequal to the size of a long; otherwise
0

That's similar, but not identical, to your interpretation. Note, however, that even this code has undefined behavior because of (char*)a - (char*)0. Evaluating that difference has defined behavior only if both pointers point into, or just past the end of, the same object, and (char *)0 does not point (in)to or just past the end of any object.

You asked specifically:

But I don't understand why type conversion is implemented, (char*)a.

That is performed because pointer arithmetic is defined in terms of the pointed-to type, so (1), a conforming program cannot perform arithmetic with a void *, and (2) the code wants the result of the subtraction to be in the same units as the result of the sizeof operator (bytes).

And what means of this line?
(char*)a- (char*)0)% sizeof(long)==1

That line does not appear in the macro you presented, and it is not a complete expression because of unbalanced parentheses. It appears to be trying to determine whether a points one past an n-byte boundary, where n is as defined above, but again, evaluating the pointer difference has undefined behavior. Note also that for an integer x, x % sizeof(long) == 1 evaluated in boolean context has different meaning than x % sizeof(long) evaluated in the same context. The latter makes more sense in the context you described.

What means of this code in C qsort?

3 Answers3

Linked