Why does this simple part of my program makes me struck for more than two hours?

Question

I am pasting a part of my program below, why does it take large amount of time for execution?

for(i=0;i<N;i++)
     {
     for(j=0;j<N;j++)
        {
         if(j<i) {
           pot[i]+=-(x[i]-x[j])*1./pow((pow(x[i]-x[j],2.)+ blah blah,1.5);
          }
         if(j>i) {
           pot[i]+=(x[i]-x[j])*1./pow((pow(x[i]-x[j],2.)+blah blah,1.5);
          }
        }
     }

It has taken almost two hours to run, if i make any one 1.5 as 1.4 then everything will be fine.

Where as the one below works extremely fine

for(i=0;i<N;i++)
{
 pot[i]=0.0;
 for(j=0;j<N;j++)
    {
     if(j<i) 
      {
       pot[i]+=-1.*(x[i]-x[j])/pow(pow(x[i]-x[j],3.)+blah blah,1.);
      }
     if(j>i) 
      {
       pot[i]+=1.*(x[i]-x[j])/pow(pow(x[i]-x[j],1.)+blah blah,3.);
      }
    }
 }

/* i badly need the former one in my program and not later one*/

The above is a part of this block

for(i=0;i<N;i++)
    {
    for(j=0;j<N;j++)
        {
            if(j<i)
                {
                pot[i]+=-(x[i]-x[j])*1./pow(fabs(pow(x[i]-x[j],2.)+ g*g*pow(x[i+N]-x[j+N],2.)+ h*h*pow(x[i+2*N]-x[j+2*N],2.)),1.5) ;
                }
            if(j>i)
                {
                pot[i]+=(x[i]-x[j])*1./pow(fabs(pow(x[i]-x[j],2.) + g*g*pow(x[i+N]-x[j+N],2.) + h*h*pow(x[i+2*N]-x[j+2*N],2.)),1.5) ;
                }
        }
    }

Actually, all the data values would be helpful, given that this is a data value sensitive question. — david.pfx, Feb 03 '14 at 13:58
`pow(pow(x, z), y)` is equivalent to `pow(x, y * z)` which is equivalent to `pow(x*x, z)` for `y==2`, which would reduce the pain a bit. You could also remove the `if` tests, sparing a few instructions. Anyway, if you want to optimize this, you should tell us what `N` roughly is, and what you're trying to accomplish here. Is it calculating a potential for some physics simulation? And how are you compiling and executing the code? I'd expect the `-O3` optimization level to help a bit. — amon, Feb 03 '14 at 13:58
@user36397 That makes no sense – this is max 10000 calculations which should complete well within a second. Could you post a *complete* program that demonstrates this behavior (with a `main` and everything), plus your compiler setup? — amon, Feb 03 '14 at 14:09
I cannot paste the complete program as it is for research purpose and i dont want to be apprehended by project advisor — user36397, Feb 03 '14 at 14:11
@user36397 Yes, but can you *reduce* your real program to the minimal program that still shows this problem and is still runnable? — amon, Feb 03 '14 at 14:12
Please use [*this method to show what it's doing*](http://stackoverflow.com/a/378024/23771). I can guess what's going on, but the trouble with guesses is you can't depend on them. It's better to see for yourself what's happening. — Mike Dunlavey, Feb 03 '14 at 14:13
Is it equivalent if you write it as `pow[i] += (x[i] - x[j]) * pow(x[i] - x[j], -3)` ? (mind the ifs) — jlhonora, Feb 03 '14 at 14:14
@amon you can compare the older one with the one i have newly edited — user36397, Feb 03 '14 at 14:16
We can not only reduce `pow(pow(x, y), z)` to `pow(x, y*z)` but also reduce `x/pow(x, y*z)` to `pow(x, 1-y*z)` which in this case with `y*z=3` will also be equivalent to `1/(x*x)` which could be even faster. If you aren't intentionally trying to create errors caused by overflow and rounding of intermediate results, why are you writing this simple formula in such a weird way? — , Feb 03 '14 at 14:18
There are some other parts in the denominator and in numerator which i dont want to mention @WumpusQ.Wumbley — user36397, Feb 03 '14 at 14:20
@user36397 I doubt that anyone will debug/analyze your code for you. This is meant as public forum, for helping each other and leaving the traces as a reference for the people that will have similar problems. I am sorry but this question is not good. You have not conformed to the expected input laid out when you ask the 1st question, you even refuse to provide it when people politely ask them in comments for the needed info. You need to help people help you, and you refuse to do it. Providing info about your setup and an sscce should be the minimal courtesy. — luk32, Feb 03 '14 at 14:20
@user36397 Both versions should produce the same output (except for floating-point inaccuracies). I assume there could be a bug in your implementation of `pow`, but there is no way to investigate without you posting a minimal runnable program that demonstrates the problem, and your compiler setup. *More likely*, you made some other error like a typo that you've accidentally fixed when changing the code. — amon, Feb 03 '14 at 14:22
The code should not run for hours, I suspect the data structures of X, Y and pot — , Feb 03 '14 at 15:20

Mike Dunlavey · Accepted Answer · 2014-02-03T14:06:06.117

5

Well, if N is 1000, it will perform pow two million times. pow, in this case, performs log, followed by multiplication, followed by exp. This is heavy code.

I just want to point out, in case you were wondering, that compiler optimization would be no help on this code, because the program counter spends almost all of its time in log and exp.

Also, as @amon pointed out, you could get rid of one pow call, giving a factor of two speedup.

edited Feb 03 '14 at 14:06

answered Feb 03 '14 at 13:58

Mike Dunlavey

40,059
14
91
135

N goes from 3 to max of 100 – user36397 Feb 03 '14 at 14:17
@user36397: OK, then 20,000 calls to `pow` at, say, 10 microseconds each (to pick a number out of the air), would be 2 seconds. So trying to psyche this one out is not working. Please do this: start it under a debugger, like GDB. If you don't know how, it is a very good skill to learn. While it is being slow, interrupt it with Ctrl-C or a pause button. Then display the stack. That will at least show you precisely what it is doing in all that time. This is far better than pulling guesses out of the air, even from experts. – Mike Dunlavey Feb 03 '14 at 14:24
The codes which i have pasted is a part of a huge program that is running in the Cluster that belongs to institution, I can have no access to whatsoever it is doing. – user36397 Feb 03 '14 at 14:28
3

OK, now is when you turn into a real programmer, which is when you puff up some skirts, throw over the table, and *get* access to the code. Either that or reproduce the problem in a separate program. "They won't let me" does not make guessing work any better. – Mike Dunlavey Feb 03 '14 at 14:31
3

@user36397: that's a really, really bad excuse. You could obviously paste that code above into a new program, add the missing variable declarations and initialization around it and run that in a debugger or profiler. Should not take you more that 10 minutes. – Doc Brown Feb 03 '14 at 15:41

score 3 · Answer 2 · answered Feb 03 '14 at 14:36

As it stands, your are going to generate 2 N ** 2 calls to pow(). pow() is expensive, involving a call to log() and exp() for each call to pow(). For N=1000, that is, as the other guy observed, 2 million calls of log() and exp(), apiece.

log() and exp() are themselves expensive, being floating-point power series expansions.

It is POSSIBLE that your compiler knows that numbers like 2. and 3. are in fact small integers, and knows how to reduce pow(x, N), where N is a small integer, to the appropriate series of multiplications and squares. This would make for a dramatic reduction in runtime on your second example.

Dijkstra once observed (in a grad student recruiting talk) that the object of the exercise is "Don't make a mess of it." Observing that pow(x,2) == x*x and pow(x,1.5) == x * sqrt(x), you can get rid of a LOT of transcendental numbercrunching calls by defining two (inline) functions, and rewriting the critical line as:

    pot[i]+=(x[i]-x[j])*1./xsqrtx(square(x[i]-x[j])+blahblah);

Also, if you are actually doing this in C/C++, it would probably be worthwhile to cache the point difference.

    delta = x(i]-x[j];        
    pot[i]+=(delta)*1./xsqrtx(square(delta)+blahblah);

C/C++ is not allowed to assume that x didn't change while you weren't looking, and make this particular common subexpression optimization. (FORTRAN is and does.)

Finally, are you aware that you can divide floating-point numbers?

    delta = x(i]-x[j];        
    pot[i]+=(delta)/(xsqrtx(square(delta)+blahblah));

Why does this simple part of my program makes me struck for more than two hours?

2 Answers2