4

I am trying to reduce code duplication in my C program, where all the statements in each branch of an if/else block are identical, except for a function name and its arguments. The idea is that the user specifies either x, y, or z, and the program measures how long it takes to run either func_x, func_y, or func_z 1000 times.

More specifically, here is the high level design of the C code:

// struct definitions
struct dat_x {...};
struct dat_y {...};
struct dat_z {...};

// reading structs from a text file
struct dat_x read_dat_x_from_file(char *path);
struct dat_y read_dat_y_from_file(char *path);
struct dat_z read_dat_z_from_file(char *path);

// functions
int func_x(struct dat_x);
int func_y(struct dat_y);
int func_z(struct dat_z);

// runner computing runtime of func_x, func_y, or func_z
int main(int argc, char** argv) {
    char *func_name = argv[1];
    char *path = argv[2];

    int a;
    clock_t t;

    if (strcmp(func_name, "x") == 0) {
        struct dat_x args = read_dat_x_from_file(path);

        t = clock();
        for (int i = 0; i < 1000; i++) {
            a += func_x(args);
        }
        t = clock() - t;

    } else if (strcmp(func_name, "y") == 0) {
        struct dat_y args = read_dat_y_from_file(path);

        t = clock();
        for (int i = 0; i < 1000; i++) {
            a += func_y(args);
        }
        t = clock() - t;

    } else if (strcmp(func_name, "z") == 0) {
        struct dat_z args = read_dat_z_from_file(path);

        t = clock();
        for (int i = 0; i < 1000; i++) {
            a += func_z(args);
        }
        t = clock() - t;

    }

    // report runtime
    double e = ((double)t) / CLOCKS_PER_SEC;
    printf("%s: %f %d\n", func_name, e, a);
}

As you can see, in the main function all the statements in each branch of the if-else block are identical; the only difference is that either func_x, func_y, or func_z.

In a functional language, this pattern can be abstract by having a function run_timing_benchmark which takes the func_* and dat_* arguments and hten runs the loop (possibly using polymorphism to define the signature of g). While I can use function pointers in C, I can't write a polymorphic type signature.

What are suggestions for how to reduce the duplication in this program so that the timing code is only defined once? In practice I may have dozens of functions (not just x/y/z) to benchmark using the same code, and the timing code may be more complex.

jII
  • 1,134
  • 2
  • 17
  • 29
  • 1
    Is it an option to change the functions or are you stuck with them? – Lundin Sep 13 '19 at 11:35
  • Welp, your funcs all have different signatures, can't really do much without changing those. – Marco Bonelli Sep 13 '19 at 11:41
  • How do the functions have different signatures? The OP appears correct in their assertion that they differ only in *name*. – John Bollinger Sep 13 '19 at 11:43
  • @JohnBollinger seems pretty clear to me: `int func_x(struct dat_x);` vs `int func_y(struct dat_y);` vs `int func_z(struct dat_z);` – Marco Bonelli Sep 13 '19 at 11:44
  • 1
    Possible duplicate of [How do function pointers in C work?](https://stackoverflow.com/questions/840501/how-do-function-pointers-in-c-work) – klutt Sep 13 '19 at 11:44
  • Ah, I see what you mean, @MarcoBonelli. But that's not an insurmountable problem. – John Bollinger Sep 13 '19 at 11:45
  • @JohnBollinger it really is. You cannot "generalize" that piece of code without changing the signature of those functions. – Marco Bonelli Sep 13 '19 at 11:46
  • You can, @MarcoBonelli. You can declare the functions to `main()` without prototypes. Or going a totally different direction, you can reduce code duplication by using a macro. – John Bollinger Sep 13 '19 at 11:48
  • @JohnBollinger you should answer the question then. I don't see how you could correctly call a function without knowing its prototype. – Marco Bonelli Sep 13 '19 at 11:54
  • OP, you should specify which C standard you expect to comply with. Seems like a pretty important thing. – Marco Bonelli Sep 13 '19 at 11:59
  • 1
    If you have fully working code, then codereview.stackexchange.com is a better place. – klutt Sep 13 '19 at 12:09
  • 1
    @klutt *then codereview.stackexchange.com is a better place* I tend to agree, but it's not off-topic here given that it's ["a practical, answerable problem that is unique to software development"](https://stackoverflow.com/help/on-topic). And I suspect there's a lot more visibility here than on codereview.stackexchange.com – Andrew Henle Sep 13 '19 at 12:21
  • Can you reduce your comparison to a single `char`? Given `strcmp(func_name, "x")`, if you can do that, you can replace the `if ... else if ... else if ...` with a `switch` statement and appropriate `case` labels. Perhaps more than one - one for obtaining the data, and one in a loop. – Andrew Henle Sep 13 '19 at 12:25

5 Answers5

3

One idea could be to make a union to abstract the differences in function signatures and return values. Then build a table of functions and invoke the right one according to the name passed. Something like this: (warning: not tested!)

// struct definitions
struct dat_x {...};
struct dat_y {...};
struct dat_z {...};

// Build a union that contains the structs
union uargs
{
    struct dat_x;
    struct dat_y;
    struct dat_z;
};

// reading structs from a text file (but packaged into the union type)
union uargs read_dat_x_from_file(char *path);
union uargs read_dat_y_from_file(char *path);
union uargs read_dat_z_from_file(char *path);

// functions
int func_x(union uargs dat);
int func_y(union uargs dat);
int func_z(union uargs dat);

struct table_t
{
    char *name;
    union uargs (*read_dat_fp);
    int (*fp)(union uargs dat);
};

// Table of function pointers
struct table_t func_table[]
{
    { "x", read_dat_x_from_file, func_x},
    { "y", read_dat_y_from_file, func_y},
    { "z", read_dat_x_from_file, func_z}
};

// runner computing runtime of func_x, func_y, or func_z
int main(int argc, char** argv) {
    char *func_name = argv[1];
    char *path = argv[2];

    int a;
    clock_t t;

    for(int i = 0; i < sizeof(func_table) / sizeof(table_t); i++)
    {
        if(strcmp(func_name, func_table[i].name) == 0)
        {
            union uargs args = func_table[i].read_dat_fp(path);
            t = clock();
            for (int i = 0; i < 1000; i++) {
                a += func_table[i].fp(args);
            }
            t = clock() - t;
            break;
        }
    }

    // report runtime
    double e = ((double)t) / CLOCKS_PER_SEC;
    printf("%s: %f %d\n", func_name, e, a);
}

This gets rid of the code duplication and is also somewhat easily scalable. Another option could be to use some macro magic as in this answer from @Barmar.

Edit: Of course, instead of the union, you could simply use void* and typecasting to pass along pointers to the structs, re-casting them as needed inside the functions. But then you completely throw away all type checking.

th33lf
  • 2,177
  • 11
  • 15
  • "Variants" are very ugly. Suppose `dat_x` is huge but `dat_y` is small? You'd kill everything that is performance. Seems to me this creates a bunch of new problems not present in the original code. – Lundin Sep 13 '19 at 12:00
  • @Lundin Further optimizations are possible that fall beyond the scope of this answer. Such as passing pointers along instead of huge structures. I obviously wasn't addressing such concerns in this answer. – th33lf Sep 13 '19 at 12:03
3

You can use a macro to generate the code. Since all the structures and functions follow a common naming scheme, you can use token pasting to generate them.

#define PROCESS(suffix, func_name_var, path_var, sum_var, time_var) \
time_var = time();
if(strcmp(func_name_var, #suffix) == 0) { \
    struct dat_##suffix args = read_dat_##suffix##_from_file(path_var); \
    for (int i = 0; i < 1000; i++) { \
        sum_var += func_##suffix(args); \
    } \
time_var = time() - time_var;

then you can use it like this:

        PROCESS(x, func_name, path, a, t)
        else PROCESS(y, func_name, path, a, t)
        else PROCESS(z, func_name, path, a, t)
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • Thanks for this great answer. Why did you prefer to use `PROCESS(x, ...)` as opposed to `PROCESS("x", ...)`, i.e., isn't it easier to pass in the suffix as a string? – jII Sep 13 '19 at 18:13
  • 2
    Because it needs to use token pasting to create names like `dat_x` and `read_dat_x_from_file`. That doesn't work with a string. – Barmar Sep 13 '19 at 18:31
3

@MarcoBonelli observed correctly in comments that your functions are not quite as similar as they may seem. They have different argument and return types, which is an important distinction in strongly-typed languages such as C. Those functions are not interchangeable in a C-language sense; given that they have different return types, there's not even any function-pointer type that would be compatible with pointers to all the functions.

If you can change the functions then it would be possible to do so in a way that overcomes that limitation. For instance, you could accept the structures to populate as out parameters of type void *:

void read_dat_y_from_file(const char *path, void *args) {
    struct dat_y *y_args = (struct dat_y *) args;
    // ...
}

    // ...
    struct dat_y args;
    read_dat_y_from_file(path, &args);

You could write a function-pointer-based solution around that.


But a simpler way forward that does not require modifying any functions would be to move the repeated code to a macro:

#define read_and_time(tag) do { \
    struct dat_ ## tag args = read_dat_## tag ## _from_file(path); \
    t = clock(); \
    for (int i = 0; i < 1000; i++) { \
        a += func_ ## tag(args); \
    } \
    t = clock() - t; \
while (0)

With that, you would reduce the if / else chain to

if (strcmp(func_name, "x") == 0) {
    read_and_time(x);
} else if (strcmp(func_name, "y") == 0) {
    read_and_time(y);
} else if (strcmp(func_name, "z") == 0) {
    read_and_time(z);
}

You could even pull a bit more of that into the macro, but I think this form serves clarity best.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
1

The best way would have been to give the functions the same interface. Then you could have created an array of function pointers and gotten quite pretty code.

However, if you are stuck with the functions as they are, the least evil way to reduce code repetition is to use function-like macros. For example by using C11 _Generic:

#define read_dat_from_file(result, path) (result) =  \
  _Generic((result),                                 \
    struct dat_x: read_dat_x_from_file,              \
    struct dat_y: read_dat_y_from_file,              \
    struct dat_z: read_dat_z_from_file ) (path);

Where result is a variable of the struct type that you want to store the results inside. Full example:

#include <stdio.h>

struct dat_x { int x; };
struct dat_y { int y; };
struct dat_z { int z; };

struct dat_x read_dat_x_from_file(char *path) 
{ 
  puts(__func__); 
  return (struct dat_x){1};
}

struct dat_y read_dat_y_from_file(char *path)
{ 
  puts(__func__); 
  return (struct dat_y){2};
}

struct dat_z read_dat_z_from_file(char *path)
{ 
  puts(__func__); 
  return (struct dat_z){3};
}


#define read_dat_from_file(result, path) (result) =  \
  _Generic((result),                                 \
    struct dat_x: read_dat_x_from_file,              \
    struct dat_y: read_dat_y_from_file,              \
    struct dat_z: read_dat_z_from_file ) (path);

int main (void)
{
  struct dat_x x;
  struct dat_y y;
  struct dat_z z;

  read_dat_from_file(x, "");
  read_dat_from_file(y, "");
  read_dat_from_file(z, "");

  printf("%d\n", x.x);
  printf("%d\n", y.y);
  printf("%d\n", z.z);
}

Output:

read_dat_x_from_file
read_dat_y_from_file
read_dat_z_from_file
1
2
3
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • This doesn't address the OP's original concern - the `if-else` with a lot of code duplication. – th33lf Sep 13 '19 at 12:08
-1

A not super elegant way, would be to use a function pointer like this:

int (*funcPtr)(void *arg)

And in the implementation of the function you cast the void * to the actual functions argument like struct dat_x *arg = (struct dat_x *)arg

A.Franzen
  • 725
  • 4
  • 10
  • All cool, except every function needs to get the data first using that other family of functions all returning different values. – Marco Bonelli Sep 13 '19 at 11:48
  • same thing, use void * as you carrier of information. Working untyped is not nice, but it works. – A.Franzen Sep 13 '19 at 11:52