2

I'm learning C by myself using real-world examples. Given my background (physics) I'm implementing a 4th order Runge-Kutta method. I have a function which performs a single RK4 step with arbitrary input ODE system and timestep. When calculating intermediate steps for RK4, I have two approaches, but I can't decide which one is more efficient.

In the 1st approach, I'm allocating an array 'k' on the heap, passing it to the RK4 step function, and doing calculations. You can see the related functions below:

void rk4Step(double t, double dt, double* p, double* y, double* k, int n_eqns, ODE* f) {
    double y_orig_val;
    for (int i = 0; i < n_eqns; i++) {
        y_orig_val = y[i];
        k[0] = dt * (*f)(p, t, y, i);

        y[i] += 0.5 * k[0];
        k[1] = dt * (*f)(p, t + 0.5 * dt, y, i);
        y[i] = y_orig_val;

        y[i] += 0.5 * k[1];
        k[2] = dt * (*f)(p, t + 0.5 * dt, y, i);
        y[i] = y_orig_val;

        y[i] += k[2];
        k[3] = dt * (*f)(p, t + dt, y, i);
        y[i] = y_orig_val;

        y[i] += 1.0 / 3.0 * (1.0 / 2.0 * (k[0] + k[3]) + (k[1] + k[2]));
    }
}

void solveODE(double t, double dt, int n_eqns, int n_params, int n_steps, ODE* f) {

    double* y = malloc(n_eqns * sizeof(double));
    double* p = malloc(n_params * sizeof(double));
    double* k = malloc(4 * sizeof(double));
    init(p, y, n_eqns);

    FILE* sol = fopen("sol.dat", "w");

    for (int i = 0; i <= n_steps; i++) {
        fprintf(sol, "%f\t", t);
        for (int j = 0; j < n_eqns-1; j++) {
            fprintf(sol, "%f\t", y[j]);
        }
        fprintf(sol, "%f\n", y[n_eqns-1]);

        rk4Step(t, dt, p, y, k, n_eqns, f);
        t += dt;
    }

    free(y);
    free(p);
    free(k);
    fclose(sol);
}

In the other approach the variable 'k' is not allocated on the heap, but on the stack in the body of the rk4Step function. As you can see, the rk4Step function is used in a loop. This was the variable k would be allocated on the stack again and again in each iteration.

I know that working with variables on the stack is much faster, but is it true in this case too?

Martin
  • 121
  • 1
  • 9
  • 3
    Have you tried benchmarking your code to find out? – UnholySheep Jul 30 '22 at 19:33
  • 5
    Generally speaking, with the exception of variable-length arrays, "allocating" variables with automatic storage durations (allocated on the stack on most architectures) has no runtime cost. On most architectures it's simply stack space that gets reserved at the beginning of the function – UnholySheep Jul 30 '22 at 19:35
  • What prevents you from declaring it on the stack before the loop? – infinitezero Jul 30 '22 at 19:38
  • 1
    In reality, the `rk4Step` function is going to be inlined. "Allocating" and "deallocating" on the stack consists only of subtracting and adding to the stack pointer register, and when this means that you add a value and immediately subtract it again, the compiler will optimize that out. So there won't really be a per-iteration cost to defining `k` within the function or some smaller block. You are just comparing the cost of calling `malloc` once, versus the cost of subtracting the stack pointer once. Both are completely negligible in the big scheme of things. – Nate Eldredge Jul 30 '22 at 19:39
  • Thank you for all the answers! I haven't tried benchmarking it, but in the future, I will start doing so. I was interested in the background of the answer, not just in the fact if it is more efficient or not. If you post these as answers, I'll accept them! @infinitezero: I might be wrong (just started learning /revising actually/ a few days ago), but I think would not be able to access it from the ```rk4Step``` function if it was declared on the stack in the ```solveODE``` function. – Martin Jul 30 '22 at 19:45
  • 1
    @Martin you can pass the address of the stack array with the address operator `&` – infinitezero Jul 30 '22 at 19:49
  • @infinitezero Oh. Good to know! Thanks! I thought that only works with stuff on the heap. – Martin Jul 30 '22 at 19:50
  • The question becomes relevant again, likely with a different answer, when `f` is vector valued, that is, you want to solve systems of coupled equations. – Lutz Lehmann Jul 30 '22 at 20:32
  • @UnholySheep Why variable-length arrays are exceptions? Substracting/adding runtime value shouldn't significantly differ from substracting/adding a constant.Or do you mean compiler can't evaluate all frame size at compile time and optimize all allocations into single instruction? – dimich Jul 31 '22 at 02:14
  • @LutzLehmann. Yes, the `f` function can be vector valued. By this I mean that in the current construction it returns one component of the system of equations. So evaluating all components would require `n_eqns` number of calls of the function `f` with different `i` argument. See example below: ```double damped_oscillator(double* p, double t, double* y, int n) { switch (n) { case 0: return y[1]; case 1: return -p[0] / p[1] * y[0] - p[2]*y[1]; } }``` – Martin Jul 31 '22 at 08:09
  • 1
    Yes, that is a suitable if ineffective interface. The implementation of the RK method is then of course not even consistent, it would be a surprise if you get reasonable results, similarly to the question in https://stackoverflow.com/questions/35805811/pendulum-integration-overflow. For a question with a similar interface and loop structure see https://stackoverflow.com/questions/46654283/runge-kutta-for-coupled-odes – Lutz Lehmann Jul 31 '22 at 12:06
  • @LutzLehmann I think I get what you are saying, thanks for pointing out! But just to be sure: The issue with this is that for example if I update one coordinate, lets say position in the upper case, then the next coordinate (momentum) would be calculated with the already updated position value, right? Given these it is indeed a surprise that the results seemed fine (I haven't validated it with analytical calculations, just checked them visually) – Martin Aug 15 '22 at 13:27
  • Yes, do some checks. It is still an order 1 method, but then with a step size that is some multiple (4 times?) of `dt`. So if `dt` is small enough, you follow an approximate solution curve, but not with the correct time parameter. – Lutz Lehmann Aug 15 '22 at 13:34

1 Answers1

1

No need to allocate on the heap:

void solveODE(double t, double dt, int n_eqns, int n_params, int n_steps, ODE* f) {

    double y[n_eqns];
    double p[n_params];
    double k[4];
    init(p, y, n_eqns);

    FILE* sol = fopen("sol.dat", "w");

    for (int i = 0; i <= n_steps; i++) {
        fprintf(sol, "%f\t", t);
        for (int j = 0; j < n_eqns-1; j++) {
            fprintf(sol, "%f\t", y[j]);
        }
        fprintf(sol, "%f\n", y[n_eqns-1]);

        rk4Step(t, dt, &p, &y, &k, n_eqns, f);
        t += dt;
    }

 
    fclose(sol);
}
infinitezero
  • 1,610
  • 3
  • 14
  • 29