Gravity computations resulting in NaN. No clear reason

Question

I have problem with my C code. It's force solution on N-bodies problem in 2D. Sometimes I get NaN as struct instance params value.

My guess is that something is wrong with division. I have analyzed a lot of cases, but still could not find an occuring pattern that results in values being NaN.

Here is my code:

#include <stdio.h>
#include <math.h>
#include <time.h>
#include <stdlib.h>

double G;  
int N;
int T;
int COUNT;

typedef struct 
{
    double rx, ry;
    double vx, vy;
    double fx, fy;
    double mass;
} Body;

void updateBody (Body* bodyInstance, int timestamp) {
    bodyInstance->vx += timestamp * bodyInstance->fx / bodyInstance->mass;
    bodyInstance->vy += timestamp * bodyInstance->fy / bodyInstance->mass;
    bodyInstance->rx += timestamp * bodyInstance->vx;
    bodyInstance->ry += timestamp * bodyInstance->vy;
};

void updateBodyForces (Body* bodyA, Body* bodyB) {
    double dx, dy, dist, force;
    dx = bodyB->rx - bodyA->rx;
    dy = bodyB->rx - bodyA->rx;
    // collision/same place in spacetime hack
    if (bodyB->rx == bodyA->rx && bodyB->ry == bodyA->ry) {
        dist = 1;
    } else {
        dist = sqrt(pow(dx, 2) + pow(dy, 2));
    }
    force = (G * bodyA->mass * bodyB->mass) / (pow(dist, 2) + 100);
    bodyA->fx += force * dx / dist;
    bodyA->fy += force * dy / dist;
}

void resetBodyForces (Body* bodyInstance) {
    bodyInstance->fx = 0;
    bodyInstance->fy = 0;
}

void getRandomBody (Body* bI) {
    bI->rx = rand() % 10;
    bI->ry = rand() % 10;
    bI->vx = rand() % 10;
    bI->vy = rand() % 10;
    bI->fx = 0;
    bI->fy = 0;
    bI->mass = 20;
}

int main( int argc, char *argv[] )  {
    G = argc >= 2 ? atof(argv[1]) : 0.01;  
    N = argc >= 3 ? atoi(argv[2]) : 3;
    T = argc >= 4 ? atoi(argv[3]) : 1;
    COUNT = argc >= 5 ? atoi(argv[4]) : 10;
    srand(time(NULL));
    Body bodies[N];
    for (int i=0; i<N; i++) {
        getRandomBody(&bodies[i]);
    }
    for (int i = 0; i < COUNT; i++) {
        for (int j = 0; j < N; j++) {
            resetBodyForces(&bodies[j]);
            for (int k = 0; k < N; k++) {
                if (j != k) {
                    updateBodyForces(&bodies[j], &bodies[k]);
                }
            }
        }
        for (int j = 0; j < N; j++) {
            updateBody(&bodies[j], T);
        }
    }
}

Do you always get `-nan` for the same input? If the problem is consistent the it should be easy [to debug](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/). — Some programmer dude, Jan 02 '18 at 13:40
[you should avoid `atoi`](https://stackoverflow.com/q/17710018/995714) — phuclv, Jan 02 '18 at 13:45
`pow(dx, 2)` will be a problem when `dx` is negative with implementations that calculate the power as `exp(log(dx) * 2)`. Prefer `dx*dx` to square numbers. — M Oehm, Jan 02 '18 at 13:47
Instead of using `srand(time)`, try a fixed seed so that you can debug again with the same input values. But I would presume it's zero `dist`, or negative `dx`/`dy` as @MOehm wrote. — vgru, Jan 02 '18 at 13:56
@Bathsheba, if `dx` is negative then `log(dx)` causes a domain error. The fact that 2 is a positive integer is irrelevant if the implementation of `pow()` is as MOehm describes. — John Bollinger, Jan 02 '18 at 14:34
@JohnBollinger: Yes, looking at http://en.cppreference.com/w/c/numeric/math/pow, it appears there is no special provision for a positive integer exponent. — Bathsheba, Jan 02 '18 at 14:39
`dy = bodyB->rx - bodyA->rx;` you probably mean `dy = bodyB->ry - bodyA->ry;` — Mathieu Borderé, Jan 02 '18 at 16:14
@MOehm A simplistic `pow(dx, x)` as `exp(log(dx) * x)` is non-compliant as it does not compute x raised to the power y for negative x. — chux - Reinstate Monica, Jan 02 '18 at 16:32
@chux: That may well be, but that doesn't mean that there aren't any [implementations of such code](https://stackoverflow.com/questions/44635067/program-to-armstrong-number-of-n-digits-is-giving-wrong-output-for-only-153) out there. (You can get run over by a car in a pedestrian zone.) — M Oehm, Jan 02 '18 at 16:39
the posted code does not cleanly compile, When compiling, always enable the warnings, then fix those warnings. (for `gcc`, at a minimum use: `-Wall -Wextra -Wconversion -pedantic -std=gnu11 ) — user3629249, Jan 04 '18 at 17:56
on (about) line 24 there is this statement: `};` which is not valid syntax. — user3629249, Jan 04 '18 at 17:58
regarding the `typedef struct...` in the posted code. 1) it is much better (for flexability and other reasons) to separate the definition of the struxt from the typedef for the struct. 2) always use a 'tag name' in the struct definition. For several reasons, including that most debuggers cannot 'see' the individual fields without a tag name. 3) follow the axiom: *only one statement per line and (at most) one variable declaration per statement.* — user3629249, Jan 04 '18 at 18:02
to avoid one of the several implicit conversions, this statement: `srand(time(NULL));` should be: `srand( (unsigned int)time(NULL) );` — user3629249, Jan 04 '18 at 18:05

Jongware · Answer 1 · 2018-01-03T00:13:09.190

In updateBodyForces you test two floating point values for equality. They may differ by as little as the very last bit, about 1/10,000,000.

Right after this you take the square root of their difference squared, and so the result may be 0 (really really zero, 0.0000000...), which is not a problem, but then you divide by that number. That is the source of the NaN.

Replace this part

// collision/same place in spacetime hack
if (bodyB->rx == bodyA->rx && bodyB->ry == bodyA->ry) {
    dist = 1;
}

with a more explicit test based on FLT_EPSILON. See Floating point equality and tolerances for a longer explanation.

After some testing: the epsilon value is difficult to guess. Since you are okay with a dist = 1 for corner cases, add this below the test, above the force line, to be sure:

if (dist < 1)
    dist = 1;

so you won't get any NaNs for sure. That leads to this simpler function:

void updateBodyForces (Body* bodyA, Body* bodyB) {
    double dx, dy, dist, force;
    dx = bodyB->rx - bodyA->rx;
    dy = bodyB->ry - bodyA->ry;
    dist = sqrt(dx*dx + dy*dy);
    // collision/same place in spacetime hack
    if (dist < 1)
        dist = 1;
    force = (G * bodyA->mass * bodyB->mass) / (pow(dist, 2) + 100);
    bodyA->fx += force * dx / dist;
    bodyA->fy += force * dy / dist;
}

You can make the wibbly-wobbly space-time hack a bit less obvious by replacing the 1 with a smaller value as well.

Nice answer but I *really* don't like the use of `pow` for squaring. I also don't like allowing `sqrt` to return NaN. Not sure that's strictly portable. — Bathsheba, Jan 02 '18 at 14:19
@Bathsheba: agree. Is this better? I don't think it's possible to return NaN anymore. — Jongware, Jan 02 '18 at 14:25
For completeness, `dy = y2 - y1` instead of `x2 - x1` (OP's code has the same typo). — vgru, Jan 02 '18 at 23:54

Bathsheba · Accepted Answer · 2018-01-02T14:50:39.600

3

A common (and by far the most likely explanation here) production of the -NaN result is a negative argument to sqrt, due to either (i) the base parameter to pow being negative, or (ii) an accumulation of joke digits in your floating point variables: bodyInstance->vx += &c. will accumulate rounding errors.

Check that case prior to calling sqrt.

You will also get a NaN with an expression like 0.0 / 0.0, but I've never seen a platform yield a negative NaN in that instance.

So the moral here is to prefer dx * dx to pow(dx, 2): the former is more accurate, not vulnerable to unexpected results with negative dx, and certainly not slower. Better still, use hypot from the C standard library.

Reference: http://en.cppreference.com/w/c/numeric/math/hypot

edited Jan 02 '18 at 14:50

answered Jan 02 '18 at 13:40

Bathsheba

231,907
34
361
483

Thanks for the tip. The problem was calculating square root of 0. And a follow up question. Why you refer to my digits as funny? – Moonjsit Jan 02 '18 at 13:56
2

@Moonjsit: "Joke digits" is a common term for the error component of a floating point type caused by repeated rounding. – Bathsheba Jan 02 '18 at 13:57
@Bathsheba: It is a bad term, because the error is neither decimal digits nor a joke. “Accumulated error” might be a better term. – Eric Postpischil Jan 02 '18 at 14:24

Gravity computations resulting in NaN. No clear reason

2 Answers2