My question is, is there an advantage in having a function/struct for each data type?
Yes. It adds compile-time type checking.
The code used to implement the same operation for different types does differ, and not necessarily by just the element type used. (For matrix operations, the optimum caching strategy may differ between integer and floating-point types of the same size, for example; especially if the hardware supports vectorization.) This means each element type requires their own version of each operation.
It is possible to use some templating techniques to generate element type specific versions of operations that only differ by the type, but usually the end result is more complicated (and thus harder to maintain) than just maintaining the slightly differing implementations separately.
It is quite possible to add an additional layer -- no modifications, just an additional header file included after GSL --, using the preprocessor and either GCC extensions (__typeof__
) or C11 _Generic()
to present a single "function" for each matrix operation, that chooses the function called at compile time based on the type of the parameter(s).
Why not just use a void pointer instead to only have one set of those structs/functions?
Because not only do you lose the compile-time type checking -- the user can supply say a literal string, and the compiler won't warn about it, no matter what warnings are enabled --, but it would also add run-time overhead.
Instead of choosing the proper function (implementation) to call at compile time, the data type field would have to be examined and the correct function called at run time. The generic matrix multiply function, for example, might look like
status_code_type matrix_multiply(void *dest, void *left, void *right)
{
const element_type tleft = ((struct generic_matrix_type *)left)->type;
const element_type tright = ((struct generic_matrix_type *)right)->type;
if (tleft != tright)
return ERROR_TYPES_MISMATCH;
switch (tleft) {
case ELEMENT_TYPE_INT:
return matrix_mul_int_int(dest, left, right);
case ELEMENT_TYPE_FLOAT:
return matrix_mul_float_float(dest, left, right);
case ELEMENT_TYPE_DOUBLE:
return matrix_mul_double_double(dest, left, right);
case ELEMENT_TYPE_COMPLEX_FLOAT:
return matrix_mul_cfloat_cfloat(dest, left, right);
case ELEMENT_TYPE_COMPLEX_DOUBLE:
return matrix_mul_cdouble_cdouble(dest, left, right);
default:
return ERROR_UNSUPPORTED_TYPE;
}
}
All of the above code is pure overhead, with the sole purpose of making it "slightly easier" on the programmer. The GSL developers, for example, didn't find it necessary or useful.
Quite a lot of C code -- including most C libraries' FILE
implementation -- does utilize a related approach, however: the data structure itself contains function pointers for each operation the data type supports, in an object-oriented fashion.
For example, you could have
struct matrix {
long rows;
long cols;
long rowstep; /* Number of bytes to next row */
long colstep; /* Number of bytes to next element */
size_t size; /* Size of each element */
int type; /* Type of each element */
char *data; /* Logically void*, but allows pointer arithmetic */
int (*supports)(int, int);
int (*get)(struct matrix *, long, long, int, void *);
int (*set)(struct matrix *, long, long, int, const void *);
int (*mul)(struct matrix *, long, long, int, const void *);
int (*div)(struct matrix *, long, long, int, const void *);
int (*add)(struct matrix *, long, long, int, const void *);
int (*sub)(struct matrix *, long, long, int, const void *);
};
where the
int supports(int source_type, int target_type);
is used to find out whether the other callbacks support the necessary operations between the two types, and the rest of the member functions,
int get(struct matrix *m, long row, long col, int to_type, void *to);
int set(struct matrix *m, long row, long col, int from_type, void *from);
int mul(struct matrix *m, long row, long col, int by_type, void *by);
int div(struct matrix *m, long row, long col, int by_type, void *by);
int add(struct matrix *m, long row, long col, int by_type, void *by);
int sub(struct matrix *m, long row, long col, int by_type, void *by);
operate on a single element of a given matrix. Note how we need to pass a reference to the matrix itself; if we call e.g. some->get(...)
, the function that the get
function pointer points to, does not automatically get a pointer to the structure via which it was called.
Also note how the value read from the matrix (get
), or otherwise used in the operation, is provided via a pointer; and the type of the data specified by the pointer is separately provided. This is needed, if you want a function that say initializes a matrix to identity to work, without the user implementing every single matrix operation function for their custom type themselves.
Because access to an element involves an indirect call, the overhead of the function pointers is quite significant -- especially if you consider how simple and fast the single-element operations actually take. (For example, a 5 clock cycle indirect call overhead on an operation that itself only takes 10 clock cycles, adds 50% overhead!)