1

The following comes from a qsort function implementation given as a solution to one of the K&R challenge questions. The challenge is to read a list of words and sort them according to the number of occurrences. The struct Word is also shown. Link to the full code: http://clc-wiki.net/wiki/K%26R2_solutions:Chapter_6:Exercise_4

typedef struct WORD
{
  char *Word;
  size_t Count;
  struct WORD *Left;
  struct WORD *Right;
} WORD;

...

CompareCounts(const void *vWord1, const void *vWord2)
{
  int Result = 0;
  WORD * const *Word1 = vWord1;
  WORD * const *Word2 = vWord2;

  assert(NULL != vWord1); 
  assert(NULL != vWord2); 

  /* ensure the result is either 1, 0 or -1 */
  if((*Word1)->Count < (*Word2)->Count)
  {
    Result = 1;
  }
  else if((*Word1)->Count > (*Word2)->Count)
  {
    Result = -1;
  }
   else
  {
    Result = 0;
  }

  return Result;
}

My question is about these lines:

WORD * const *Word1 = vWord1;
WORD * const *Word2 = vWord2;

Is this a declaration of a constant pointer to a constant variable? Or something else? And why does it have to be defined this way for the sort to work?

Corry Chapman
  • 157
  • 1
  • 1
  • 10

2 Answers2

3

The source code you linked to is a small application that reads in a text sample, generates a tree data structure that contains word frequency (how many times each word appears in the text sample), and then prints out the list of words from most frequent to least frequent.

/*

  Chapter 6. Structures

          Write a program that prints out the distinct words in its 
          input sorted into decreasing order of frequency of occurrence.
          Precede each word by its count.

  Author: Bryan Williams

*/

The pattern used in this application has a classical and elegant K&R feel about it. The first step is to process the text sample generating a tree structure in which each node contains a piece of text (a word from the text sample) along with a frequency count of how many times the piece of text is found. The second step is to then sort the tree nodes by the frequency counts. The third step is to print the sorted tree nodes in order of frequency to provide a list of the text pieces found along with how many times the text piece was found in the text sample.

The tree used is a binary tree and the tree nodes have the following structure:

typedef struct WORD
{
  char *Word;          // pointer to the text piece, a word of text
  size_t Count;        // frequency count for this word
  struct WORD *Left;   // pointer to tree node child left
  struct WORD *Right;  // pointer to tree node child right
} WORD;

The tree structure is used in order to be efficient about either determining if a text piece has already been found and just incrementing the count or adding the text piece with a count of one to our data storage if it does not.

However the sorting step uses a different criteria for the order of items in the tree than for the processing of the text sample. The text sample uses the text pieces as the way to order the tree nodes but for the actual output we need an order based on the frequency counts. So we need to sort the nodes of the tree on the frequency counts.

Since this is an in memory tree, the first thought for the program would be to create a list of the tree nodes in an array and then sort the list. However sorting an array usually requires moving array elements around except for the special case of the array already being sorted. This approach would also double the amount of memory used for the tree nodes since a copy of the nodes is being made.

Rather than making a copy of the tree nodes and then sorting that list, instead the program creates a list of pointers to the tree nodes and then sorts the list of pointers by referencing the tree nodes the pointers point to.

A Bit About the qsort() interface

The function definition of CompareCounts(const void *vWord1, const void *vWord2) means that vWord1 is a pointer to a const variable whose type is unknown or could be anything.

If we look at the qsort() function declaration it looks like:

void qsort (void* base, size_t num, size_t size, int (*comparator)(const void*,const void*));

So the comparison function used with qsort() must have a compatible argument list or interface description or a modern C compiler will issue an error.

Traditionally with the comparison function used with qsort() as well as bsearch(), we would have a comparison function that would look like:

CompareCounts(const void *vWord1, const void *vWord2)

In the comparison function we would then take the void * arguments and cast them to a pointer of the the actual type that is to be used in the comparison. After that we then use the local, properly typed variables to do the comparison operation.

What qsort() does is to take two elements of the array that it wants to compare and calls the comparison function with pointers to those two elements. The use of void * is to work around the type checking of the C compiler.

The interface specified that the void * pointer is pointing to something that is const because qsort() doesn't want you to change the data that it is providing. It is asking you to test the two data items provided and indicate which is greater or lesser or equal in the collating sequence you are using to sort the data.

The reason for the void * pointer is because the qsort() function does not know what the data is or how the data is structured. qsort() only knows the number of bytes, the size, of each data element so that it can iterate through the array of items, element by element. This allows the array to be any size of struct or other type of data.

The Specific Example Comparison Function

The interface, how the arguments were casted, for CompareCounts() looked strange to me until I reviewed the source code you linked to. That program generates a tree structure then generates an array of pointers which point to the actual nodes in the tree. It is this array of pointers to nodes that is passed to qsort() to sort.

So the array of data provided to qsort() is an array each element of which points to a tree node which is a WORD element stored in the tree data structure. The array of pointers are sorted based on the data the pointers point to.

In order to access a particular node by using the array passed to the qsort() function we have to take the array element and dereference it to get the actual tree node.

Since qsort() passes a pointer to the array element, the void *vWord1 is a pointer to an array element and we have to dereference that pointer to get the actual array element, a pointer to a tree element. However it is not the pointer values we want to use as the sorting criteria but rather what the pointers point to. This requires us to dereference the pointer of the array element in order to access the data of the WORD element in the tree we want to compare.

WORD * const *Word1 = vWord1; does a cast of the void * pointer vWord1 to be a pointer to a const pointer to a WORD. What this means is that Word1 is a pointer, which qsort() uses to point to the item to be sorted, that is a pointer which is const (the array element itself which qsort() does not want you to change) and the const pointer that Word1 points to, points to a WORD (the pointer to the tree node which is the data that the array contains).

What each node of the tree contains is a text word along with a count as to how many times that word is found in a sample of text. The qsort() function is being used to sort the nodes of the tree which results from examining the text sample input from most frequent to least frequent. The list of node pointers is what is provided to qsort().

So the sort is not sorting the array based on the array values but rather sorting based on what the array values, tree node pointers, point to.

By the way When I try a sample compile, I am seeing a warning warning C4090: 'initializing': different 'const' qualifiers with Visual Studio 2015 for the lines:

WORD * const *Word1 = vWord1;
WORD * const *Word2 = vWord2;

However with the following change the compiler warning goes away:

const WORD * const * Word1 = vWord1;
const WORD * const * Word2 = vWord2;

This change is actually in line with what qsort() is asking, that none of the data should be changed whether the pointer from the array element or the data that the pointer from the array element points to.

Anyway, the expression (*Word1)->Count is taking the pointer to the array element provided by qsort(), dereferencing it to get the pointer to the tree node, and then dereferencing the pointer to the tree node in order to get the member of the WORD struct we want to sort against. The sort uses the frequency count of the words, stored in Count as the sorting criteria.

Addendum topic: Playing with const

A question raised is that with this kind of complex definition, const WORD * const * Word1 = vWord1; what are various ways to generate compiler errors by breaking const and seeing when using const can provide an additional safeguard against inadvertently modifying something that should not be modified.

If we use this definition without the const modifier we would have WORD * * Word1 = vWord1; which means that we have a pointer variable Word1 that has a meaning of something like:

Word1 -> ptr -> WORD

where Word1 is our pointer variable which points to some unknown pointer which in turn points to a variable of type WORD.

Lets look at several different variations of definitions.

WORD * * Worda = vWord1;         // no const
WORD * * const Wordb = vWord1;   // Wordb is const pointer which points to a non-const pointer which points to a non-const WORD
WORD * const * Wordc = vWord1;   // WordC is a non-const pointer which points to a const pointer which points to a non-const WORD
const WORD * const * Wordd = vWord1;  // Wordd is a non-const pointer which points to a const pointer which points to a const WORD
const WORD * const * const Worde = vWord1;  // Worde is a const pointer which points to a const pointer which points to a const WORD

In the source code of the question with a definition of WORD * const * Word1 = vWord1; then Word1 is a non-const pointer which points to a const pointer which points to a non-const WORD. So lets look at several different kinds of assignments:

Word1 = vWord2;   // replace the value of the pointer Word1, allowed since non-const
(*Word1)++;       // Error: increment value of pointer pointed to by Word1, not allowed since pointer pointed to is const so compiler error
(*Word1)->Count++;  // increment value of variable pointed to by the pointer pointed to by Word1, allowed since non-const
Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
  • I am able to change the CompareCounts function and remove the const modifier from the two variables: WORD ** Word1 = vWord1; WORD ** Word2 = vWord2; – Corry Chapman Mar 18 '18 at 22:11
  • ... And qsort still works. I'm also able, using the original code, to increment the const pointer without an error. So I'm not sure his declaration of a constant pointer is necessary. – Corry Chapman Mar 18 '18 at 22:21
  • @CorryChapman one of the nice yet terrible things about C is that you can cast anything to anything and C will let you do it. Neither `qsort()` nor the comparison function depend on the use of `const`. It is being used to indicate to a programmer that the values are not to be changed and to the compiler so that it will flag any expression where the values are changed. However if you want to be dangerous and remove that safety rail, C is quite happy to let you do it. – Richard Chambers Mar 18 '18 at 22:23
  • Thanks Richard. I was surprised I could increment the const pointer. I couldn't change the const variable however, which was good :) So should all implementations of qsort functions declare const internal variables when doing casts and comparisons? – Corry Chapman Mar 18 '18 at 22:32
  • @CorryChapman I would used `const` as an indicator to any programmers who may be reading the source code as well as to allow the C compiler to help me and others should I inadvertently change the variables that `qsort()` is providing to the comparison function. In most cases that I have seen the comparison function is only a few lines of code and not very complex. On the other hand, I have seen simple functions that mess things up. I have even written a few of those myself. Are you sure the pointer was `const` as in `int * const p;` which is different from `const int *p;` or `int const *p;`. – Richard Chambers Mar 18 '18 at 22:42
  • Thanks. Yes, if you leave the original code and just add a line such as: ++Word1 it will compile and run, at least in xcode. – Corry Chapman Mar 18 '18 at 22:44
  • @CorryChapman the original code is `WORD * const *Word1 = vWord1;` which means the pointer `Word1` is not `const` however if you change it to `WORD * const * const Word1 = vWord1;` then you should get an error. However since the variable `Word1` is a local copy of what `qsort()` provided, changing it does not effect `qsort()`. However changing what it points to does. – Richard Chambers Mar 18 '18 at 22:47
  • Thanks. Very helpful. Would WORD ** const Word1 = vWord1 be just as effective? It seems to run fine. – Corry Chapman Mar 18 '18 at 23:03
  • @CorryChapman that will work fine however it is declaring that `Word1` is a pointer which is `const` so not to be changed but says nothing about the whether what is pointed to is also subject to change or not. `const` is a modifier that affects what is immediately around it but the effect does not propagate through multiple levels. – Richard Chambers Mar 18 '18 at 23:50
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/167089/discussion-between-corry-chapman-and-richard-chambers). – Corry Chapman Mar 19 '18 at 12:53
  • C assumes you know what you're doing, it's perfectly happy to let you make stupid mistakes. Unlike Pascal or Java or Basic, variable typing is mostly a suggestion, you can work around it. Pointer arithmetic for example, often necessary but usually dangerous until you're sure of what you're doing. – Alan Corey Oct 28 '18 at 01:46
  • @AlanCorey in C variable typing is more than a suggestion. Being able to work around it does not mean that variable typing in C is not rock solid and dependable. Pointer arithmetic is not "usually dangerous" so long as you are working with C the way the standards of the language and coding recommend. Its tricks and work arounds that create dangerous conditions. Being able to use tricks and work arounds is a feature of the language not a defect. – Richard Chambers Oct 28 '18 at 16:55
  • Pointer arithmetic was the first example that came to mind, casting is another. My point is just because you can stop the compiler from complaining doesn't mean it's right. I like C, it's been my preferred language for 10 years. But I've recently started looking at the assembly language output to see just what it's doing. Dereferencing and double pointers in compare functions for qsort have cost me days and they're very hit or miss still. – Alan Corey Oct 28 '18 at 18:25
1

Is this a declaration of a constant pointer to a constant variable?

No, it's just a pointer to a constant variable, not a constant pointer.

To make it clear, you are sorting an array of WORD*, for this specific problem, this WORD* pointer is a whole:

typedef WORD *wordptr;

Then the following declaration is more clear:

wordptr const *Word1 = vWord1;

And why does it have to be defined this way for the sort to work?

This is to make sure that a comparator won't modify the content of the pointer.

llllllllll
  • 16,169
  • 4
  • 31
  • 54
  • Thanks. That clarifies it. – Corry Chapman Mar 16 '18 at 17:27
  • 1
    You will want to review: [Is it a good idea to **typedef** pointers?](http://stackoverflow.com/questions/750178/is-it-a-good-idea-to-typedef-pointers). – David C. Rankin Mar 16 '18 at 19:34
  • @DavidC.Rankin The purpose of `typedef` here is only to clarify the confusion for the pointer declaration in this question. Personally I don't like `typedef` of pointer, but I won't say it's absolutely bad. – llllllllll Mar 16 '18 at 19:43