4

Context:

I have written a Red Black tree implementation in C language. To allow it to use variable types, it only handles const void * elements, and initialisation of a tree must be given a comparison function with a signature int (*comp)(const void *, const void *). So far, so good, but I now try to use that C code to build an extension module for Python. It looks simple as first sight, because Python languages always pass references to objects which are received as pointers by C routines.

Problem:

Python objects come with rich comparison operators. That means that from a C extension module, comparing 2 arbitrary objects is trivial: just a matter of using int PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid).

But the comparison may return -1 to indicate that the objects are not comparable. In Python or C++ it would be simple enough to throw an exception to signal an abnormal condition. Unfortunately C has no notion of exception, and I could not find a way using setjmp-longjmp because:

  • the environment buffer has do be known to both the englobing function and the internal one
  • I should free any allocated memory at longjmp time, when the internal function does not know what has been allocated

First idea:

A simple solution is to give a third parameter to the comparison function for it to signal an abnormal condition. But when the library is used in a plain C environment, that third parameter just does not make sense. I then remembered that in the 80', I had learned that in C language, parameters were passed in the stack in reversed order and unstacked by the caller to allow functions with a variable number of parameters. That means that provided the first 2 parameters are correct passing a third parameter to a function expecting 2 should be harmless.

Demo code:

#include <stdio.h>

// declares a type for the comparison functions
typedef int (*func)();

// A simple function for comparing integers - only 2 params
int f1(int a, int b) {
    return a - b;
}

/* Inserts a value into an increasing array
* By convention 0 denotes the end of the array
* No size control implemented for brievety
* The comp function recieves a pointer to an int
* to be able to signal abnormal conditions
* */
int insert(int* arr, int val, func comp) {
    int err = 0;
    while ((0 != *arr) && (comp(*arr, val, &err) < 0)) { // 1
        if (err) return 0;
        ++arr;
    }
    do {
        int tmp = *arr;
        *arr = val;
        val = tmp;
    } while (0 != *arr++);
    return 1;
}
int main() {
    func f = &f1;
    // a simple test with 3 parameters
    int cr = f(3, 1, 5);  // 2
    printf("%d\n", cr);

    // demo usage of the insert function
    int arr[10] = {0};
    int data[] = { 1,5,3,2,4 };
    for (int i = 0; i < sizeof(data) / sizeof(*data); i++) {
        insert(arr, data[i], f1);
    }
    for (int i = 0; i < sizeof(data) / sizeof(*data); i++) {
        printf("%d ", arr[i]);
    }
    return 0;
}

At (1) and (2) the 2 parameter function is called with 3 parameters. Of course, this code compiles without even a warning in Clang or MSVC, and runs fine giving the expected result.

Question:

While this code works fine on common implementations, I wonder whether actually passing a third parameter to a function expecting only two is really legit or does it invokes Undefined Behaviour?

Current research

  • Is it safe to invoke a C function with more parameters than it expects? : the accepted answer suggests that it should be safe when the C calling convention is used (which is my use case) while other answers show that the MSVC stdcall calling convention would not allow it
  • 6.7.6.3 Function declarators (including prototypes) and 6.5.2.2 Function calls in draft n1570 for C11, but as English is not my first language, I could not understand where it was or not allowed

Remark:

The originality of this question is that it uses function pointers conversions.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • I'm not sure I understand your idea correct. You wrote "A simple solution is to give a third parameter to the comparison function for it to signal an abnormal condition. But when the library is used in a plain C environment, that third parameter just does not make sense." Your example shows only how you call a function expecting 2 args with more args. It does not make clear to me how you would actually use the 3rd argument in certain cases. You could use a function that has 3 formal args but ignores the 3rd or pass a NULL or other special value to indicate that the caller does not need the 3rd – Bodo Aug 10 '22 at 07:58
  • @Bodo: In a Python context, the comparison function would use the third parameter to signal that the 2 passed Python object are not comparable. And the insertion function would in turn return an error code telling that the new object could not be inserted and in the end the extension module would raise a Python exception. – Serge Ballesta Aug 10 '22 at 08:09
  • Dup of e.g. [passing more than required arguments to c function](https://stackoverflow.com/questions/17104787/passing-more-than-required-arguments-to-c-function) and also billions of similar questions – Language Lawyer Aug 10 '22 at 08:54
  • _The originality of this question is that it uses function pointers_ And how does this make the question original??? In C, `f(...)` always uses function pointers, because [function designator `f` in such a context is converted to a function pointer](http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1p4). So, in the dup target, `printf("%d\n",x);` uses function pointer. – Language Lawyer Aug 10 '22 at 12:43
  • @LanguageLawyer: I should have say that it explicitely use pointer to function conversions. Because of that, 6.3.2.3p8 (conversion of pointers to function) is relevant here. And it specificaly explains why this code invokes UB (I have just found it today...). Adding it on the proposed duplicate would not make sense because no pointers are involved in the question, yet I think it can be useful for future readers. I sincerely apologize if that point was already covered in another SO question, but I could not find it, the reason why I have asked this question. – Serge Ballesta Aug 10 '22 at 13:53
  • Maybe I'm not understanding, but you don't need C to support exceptions to raise an exception in *Python*. See https://stackoverflow.com/questions/20232965/how-do-i-properly-use-pythons-c-api-and-exceptions. – chepner Aug 10 '22 at 14:02
  • @SergeBallesta In your comment you more or less rephrased what you already wrote in your question. It does not clarify how a C function would use the 3rd argument in a Python context or how it should detect if a 3rd argument was passed or not. You could implement a C function that requires 3 args and call it with a special value for the 3rd arg, e.g. NULL, to indicate that it is used in a context where the 3rd arg does not make sense or from a caller that does not need what the function would do with the 3rd arg. You could use a function with variable arguments with the resulting difficulties. – Bodo Aug 10 '22 at 14:16
  • @chepner: You are absolutely right! I have just realized that it was possible to set the exception indicator with `PyErr_SetString` in an inner function and then test it with `PyErr_Occured` in an outer one. I have learned more about function parameters in C language, but I must acknowledge that this is close to a X-Y problem... – Serge Ballesta Aug 17 '22 at 05:35

2 Answers2

7

I think it invokes Undefined Behavior.

From 6.5.2.2p6:

If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined.

The proper solution is redesign the Red Black tree implementation to allow passing a context as a third parameter.

int (*comp)(const void *, const void *, void *);

It is highly recommended to add a context argument to any function pointer type to allow emulate closures.

As a workaround, you could use a global variable.

static int err;

int f1(int a, int b) {
    err = 0;
    return a - b;
}

int insert(int* arr, int val, int comp(int,int)) {
    err = 0;
    while ((0 != *arr) && (comp(*arr, val) < 0)) { // 1
        if (err) return 0;
        ++arr;
    }
    ...
}

It is not the best solution because it is not re-entrant. Only a single instance of insert()/f1() can run at a time.

tstanisl
  • 13,520
  • 2
  • 25
  • 40
  • I think you are right. But as I could really not understand at all the next sentence (my understanding would be that calling any function using an ellipse would lead to UB) I just ignored the full paragraph. – Serge Ballesta Aug 10 '22 at 08:13
  • 1
    @SergeBallesta, `...` is used for function that accept variable number of arguments. The `printf()` function is a good example of it. `int printf(const char* fmt, ...);`. This function has a prototype which means a function declaration with specified types of parameters. – tstanisl Aug 10 '22 at 08:17
1

This is a complement to the accepted answer. The shown code uses function pointers to solve the compilation errors that would arise when calling a prototyped function with a wrong number of parameters.

But the draft n1570 for C11 says at 6.3.2.3 [Language/Conversions/Other operands/] Pointers §8:

... If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.

And it fully applies here because the referenced type is a function taking 2 parameters and the converted pointer type is a function taking 3 parameters. Per the accepted answer and 6.5.2.2p6 those two function type are not compatible, hence the code does invoke UB.


After finding that, I haved decided to give up with that way, and instead choosed to use wrapper functions that call the function passed to the library with their expected number of arguments to avoid UB.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • _And it fully applies here because the referenced type is a function taking 2 parameters_ For a pointer type `int (*)()`, the referenced type is `int()`. – Language Lawyer Aug 10 '22 at 14:54