2

Context

I am trying to learn C and came across an example of using qsort to sort an array of strings.

Questions:

I am struggling to understand the following:

  1. Why these two return statements are different?
  2. Why do we need to cast the void pointer to (char **) instead of just (char *).
int CompareWords(const void *a, const void *b) {
    char **str1 = (char **)(a);
    char **str2 = (char **)(b);
    char *c1 = (char *)a;
    char *c2 = (char *)b;
   return strcmp(c1, c2);
//    return strcmp(*str1, *str2);
}
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
obanby
  • 106
  • 1
  • 7
  • Does this answer your question? [qsort in C: compare Strings](https://stackoverflow.com/questions/20113143/qsort-in-c-compare-strings) – new Q Open Wid Jul 07 '23 at 21:09
  • 2
    The comparison function used by `qsort()` receives as its arguments *pointers to* the items to compare, converted to type `void *`. If the items are of type `char *`, then pointers to them are of type `char **`. The `CompareWords()` function presented in the question is wrong for that use, but it would be ok if the commented-out `return` statement were used instead of the one that is actually used. Those are not equivalent. – John Bollinger Jul 07 '23 at 21:13
  • @JohnBollinger Thanks! So the `strcmp(c1, c2);` is effectively comparing the pointers and not the actual strings? and if that's the case isn't dereferencing `*str1` should be the exact address of `c1`? – obanby Jul 07 '23 at 21:15

3 Answers3

5

The qsort routine does not know what you are sorting. It accepts the things to be sorted as an array for which you provide the number of bytes in each element and the number of elements.

Since qsort knows the number of bytes in each element, it can calculate the address of each element. But it does not know what type each element is. So, when it wants your routine to compare two elements, it passes that routine the addresses of two elements, using type const void *. The void is a substitute for the actual type, and the const means the data at those addresses it not intended to be changed, just compared.

Your routine does know the type of the elements. Your array contains pointers to characters. Specifically, the characters have type char, and a pointer to them is char *. So the address qsort passes you is actually the address of a char *, that is: char * *. Except qsort passed you a pointer to const data, so that data is char * const, and a pointer to it is char * const *. Note that char * const * is a pointer to a constant pointer to char, not a pointer to a pointer to constant char.

Recapping:

  • Each element in your array is a char *.
  • qsort passes you the address of an element in your array, so it passes you a char * *.
  • Except qsort added a const, so it passes you a char * const *.
  • But qsort does not know your element type, so it changes the char * part to void and passes you a void const *, which is the same as const void *.
  • You have to convert it to char * const *.

You can convert the addresses qsort passes to the actual type this way:

char * const *str1 = a;
char * const *str2 = b;

When you use const properly this way, you do not need a cast. The compiler will allow the conversion implicitly in the initialization, because implicitly converting from void * to some other pointer to object type is allowed, but implicitly removing const is not allowed. Removing const requires a cast. But using a cast means const can be removed accidentally, so it should be avoided—it is better to use an implicit conversion in initialization instead of using a cast.

Aside from not changing the pointers to the char, this comparison routine is also not going to change the char, so we can add const for them, as a safety feature that helps avoid errors:

const char * const *str1 = a;
const char * const *str2 = b;

Next, we were given pointers to pointers, but it is the latter we need to use. We can get use *str1 and *str2 to get the pointers that str1 and str2 point to:

const char *c1 = *str1;
const char *c2 = *str2;

Now c1 and c2 point to the actual characters to be compared.

Then we can compare with return strcmp(c1, c2);, which makes the whole routine:

#include <string.h>

int CompareWords(const void *a, const void *b)
{
    const char * const *str1 = a;
    const char * const *str2 = b;
    const char *c1 = *str1;
    const char *c2 = *str2;
    return strcmp(c1, c2);
}

Writing out the uses of c1 and c2 was mostly for illustration. We can also write the routine as:

int CompareWords(const void *a, const void *b)
{
    const char * const *str1 = a;
    const char * const *str2 = b;
    return strcmp(*str1, *str2);
}

Why these two return statements are different?

return strcmp(c1, c2); as it appears in the question is wrong because those c1 and c2 are just values converted from the passed addresses. They are the addresses of the pointers we need, not the pointers we need.

Why do we need to cast the void pointer to (char **) instead of just (char *).

The array that was given to qsort to sort was an array of pointers to char, and qsort passes the comparison routine pointers to those pointers. It does not pass the pointers themselves.

(It could not pass the pointers themselves because it does not know what type the elements in the array are. It does not know they are pointers, and there is no way in C to pass the value of an object whose type you do not know. Passing the value of an object requires getting the bytes of the object from memory and interpreting them, and you cannot interpret them without knowing their type.)

Lover of Structure
  • 1,561
  • 3
  • 11
  • 27
Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
1

Why these two return statements are different? Why do we need to cast the void pointer to (char **) instead of just (char *).

You may use either return statement in the comparison function. The correct choice depends on what is sorted.

The declaratopn of the function qsort looks the following way

void qsort(void *base, size_t nmemb, size_t size, 
           int (*compar)(const void *, const void *)); 

The first function parameter is the address to first element of the passed array. The second parameter specifies the number of elements in the passed array. The third parameter specifiers the size of elements of the passed array. And at last the forth parameter specifies the comparison function.

The function qsort passes to the comparison function pointers to elements of the sorted array as void pointers that are used to compare original elements of the array pointed to by the pointers within the comparison function.

For example if you have an integer array like for example

int a[5] = { 5, 3, 4, 2, 1 };

then the function qsort to compare a pair of elements of the array as for example the first and the second elements passes to the comparison function the following expressions ( const void * )&a[0] and ( const void * )&a[1]. So you can compare the elements in the comparison function by casting the pointers to the type const int * and dereferencing the obtained pointers.

Npw let's consider an array of pointers to string literals.

It can look like

char * s1[] = { "one", "two", "three", "four", "five" };

That is each element of the array has the type char *. So the function qsort passes pointers to elements of the array like for example &s1[0] and &s1[1] to the comparison function. These pointers have the type char **.

So in fact you have

const void *a = &s1[0];
const void *b = &s1[1];

To get the elements (that are pointers) of the sorted array you need to write

char **str1 = (char **)(a);
char **str2 = (char **)(b);
return strcmp(*str1, *str2);

That is you need at first to get the original pointer of the type char ** and then to derefernce the pointers to get the original type of elements of tthe array of the type char *.

Now let's consider another array. Instead of pointers to string literals we will declare a two-dimensional array like

char s2[][6] = { "one", "two", "three", "four", "five" };

Again the function qsort passes to the comparison function pointers to its elements.

But now the type of elements of the array is char [6]. A pointer to such an element has as for example in the expression &s1[0] in turn the type char ( * )[6].

A value of the address of a whole array is equal to the value of the address of of its first element.

You can check that using the following code snippet

char s2[][6] = { "one", "two", "three", "four", "five" };

 printf( "&s2[0]    = %p\n", ( void * )&s2[0] );
 printf( "&s2[0][0] = %p\n", ( void * )&s2[0][0] );

The output of this code snippet might look the following way

&s2[0]    = 0x7ffcbf1712d0
&s2[0][0] = 0x7ffcbf1712d0

So logically the call of the cpmparison function can be imagined the following way

( const void * )a = &s2[0];
( const void * )b = &s2[1];

const char *c1 = a;
const char *c2 = b; 

The pointers &s2[0] and &s2[1] and the pointers c1 and c2 have the same values. These values are addresses of the first characters of the passed sub-arrays of the two-dimensional array.

Here is a demonstration program.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int CompareWords1(const void *a, const void *b) {
    char **str1 = (char **)(a);
    char **str2 = (char **)(b);
    return strcmp(*str1, *str2);
}

int CompareWords2(const void *a, const void *b) {
    const char *c1 = ( const char * )a;
    const char *c2 = ( const char * )b;
   return strcmp(c1, c2);
}

int main( void ) 
{
    const char * s1[] = { "one", "two", "three", "four", "five" };
    const size_t N1 = sizeof( s1 ) / sizeof( *s1 );

    for ( size_t i = 0; i < N1; i++ )
    {
        printf( "\"%s\" ", s1[i] );
    } 
    putchar( '\n' );
    
    qsort( s1, N1, sizeof( *s1 ), CompareWords1 );

    for ( size_t i = 0; i < N1; i++ )
    {
        printf( "\"%s\" ", s1[i] );
    } 
    putchar( '\n' );
    
    putchar ( '\n' );

    char s2[][6] = { "one", "two", "three", "four", "five" };
    const size_t N2 = sizeof( s2 ) / sizeof( *s2 );

    for ( size_t i = 0; i < N2; i++ )
    {
        printf( "\"%s\" ", s2[i] );
    } 
    putchar( '\n' );
    
    qsort( s2, N2, sizeof( *s2 ), CompareWords2 );

        for ( size_t i = 0; i < N1; i++ )
    {
        printf( "\"%s\" ", s1[i] );
    } 
    putchar( '\n' );
}

The program output is

"one" "two" "three" "four" "five" 
"five" "four" "one" "three" "two" 

"one" "two" "three" "four" "five" 
"five" "four" "one" "three" "two" 

Of cource you could write the function CompareWords2 the following way

int CompareWords2(const void *a, const void *b) {
    const char ( *c1 )[6] = ( const char ( * )[6] )a;
    const char ( *c2 )[6] = ( const char ( * )[6] )b;
   return strcmp(*c1, *c2);
}

But it would be a bad idea to use the function when you will need to sort another two-dimensional array elements of which have some other type instead of the type char[6] because such a function will only confuse readers of the code when they will see that the actual type of elements of the sorted array differs.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
0

Why do we need (char **) to cast strings in comparator function?

The casts themselves are not needed.
The casts serve to lose the const-ness.
Like code can be made without casts.


int CompareWords(const void *a, const void *b) {
  const char *const*str1 = a;
  const char *const*str2 = b;
  const char *c1 = a;
  const char *c2 = b;
  return strcmp(c1, c2);
  // or
  return strcmp(*str1, *str2);
}

Why these two return statements are different?

They serve different goals. The correct one to use depends on how qsort() is used. Likely the return strcmp(*str1, *str2) is correct as the compare function passed to qsort() expects the addresses of the objects to compare.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256