2

I am working with pin tool that simulates a processor and having a very strange problem. In the code snippet below, Router::Evaluate() is called repeatedly many times. After it is called several million times, strange behavior occurs intermittently where "_cycles != 0" is evaluated to be true in the first IF statement and to be false in the immediately following IF statement, falling into ELSE block.

void Router::Evaluate( )
{     
  //---------debug print code---------
  if (_cycles != 0) {
    cout << "not a zero" << endl;

    if (_cycles != 0) cout << "---not a zero" << endl;
    else cout << "---zero" << endl;

  }        
  //----------------------------------

  _cycles += _speedup;
  while ( _cycles >= 1.0 ) {
    _Step();
    _cycles -= 1.0;
  }
}

//class definition
class Router : public TimedModule {
  Protected:
    double _speedup;  //initialized to 1.0
    double _cycles;  //initialized to 0.0
  ...
}

Below is the output of the code where "not a zero" followed by "---zero" is printed out from time to time seemingly randomly.

not a zero
---zero
(...some other output...)
not a zero
---zero
(...some other output...)

How could this possibly happen? This is not a multi-threaded program, so synchronization is not an issue. The program is compiled with gcc4.2.4 and executed on 32-bit CentOS. Does anybody have a clue? Thanks.

--added---

I should have mentioned this, too. I did try printing the value of _cycles each time, and it is always 0.0, which should not be possible... I also used the following g++ options: "-MM -MG -march=i686 -g -ggdb -g1 -finline-functions -O3 -fPIC"

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
ray
  • 51
  • 4
  • 1
    Did you try printing the value of `_cycles` in each case? It might give you a clue – Useless Aug 06 '12 at 18:11
  • Are you passing any special options to the compiler? – DanielKO Aug 06 '12 at 18:12
  • What's the value of `_speedup`? Is it an integer? – Kerrek SB Aug 06 '12 at 18:12
  • the value of `_speedup` is always double type 1.0. – ray Aug 06 '12 at 18:18
  • 1
    Try compiling with `-ffloat-store`. It could be the first compares to an extended value in the FPU and the next with a truncated-to-double value. (But of course, if all you're doing with the value is add or subtract `1.0`, that's not it, that can only happen with other calculations.) – Daniel Fischer Aug 06 '12 at 18:21
  • @DanielFischer For `_cycles`, I am not doing anything other than add or subtract.. Thanks for your comments anyway :) – ray Aug 06 '12 at 18:34
  • And there's really no code between the two tests except the `cout` line? That's mighty weird then. – Daniel Fischer Aug 06 '12 at 18:37
  • @DanielFischer yeah.. there's nothing between the two tests except `cout`. – ray Aug 06 '12 at 18:40
  • I suspect that `Router::Evaluate()` is being called with an invalid or corrupt object. Have you tried setting a debugger breakpoint on the `cout << "---zero"` (you may need to dig around in the assembly code to do with accurately with code compiled using `-O3`)? Take a look at the state of the `Router` object at that point - does `_cycles` make sense? Does the object look corrupt in any way? Is it an object that's still valid (maybe it's on freed heap memory or is a temporary that's done and gone). Spitting out something in the `Router` destructor might also help with diagnosis. – Michael Burr Aug 06 '12 at 22:14
  • Related question with more information on this problem: http://stackoverflow.com/questions/11860703/floating-point-instruction-anomaly-fldz-malfunctioning – Michael Burr Aug 08 '12 at 10:53

1 Answers1

1

Unless you have a horrible compiler bug, I would guess something like this is happening:

_cycles has some small fraction remaining after the subtractions. As long the compiler knows nothing else is changing its contents, it keeps its value in a higher precision floating point register. When it sees the I/O operation it is not certain the value of _cycles is needed elsewhere, so it makes sure to store its contents back to the double-precision memory location, rounding off the extra bits that were in the register. The next check assumes pessimistically the value might have changed during the I/O operation, and loads it back from memory, now without the extra bits that made it non-zero in the previous test.

As Daniel Fischer mentioned in a comment, using -ffloat-store inhibits the use of high-precision registers. If the problem goes away when using this option then the scenario I described is very likely. Check the assembly output of Router::Evaluate to be sure.

DanielKO
  • 4,422
  • 19
  • 29
  • I tried `-ffloat-store` option, but unfortunately, it didn't help. As for compiler bug, I tried two versions - 4.1.2 and 4.2.4 and the same thing happened, so probably not a compiler bug I guess..? – ray Aug 06 '12 at 18:52
  • Can you post a minimalistic, self-contained example? Because I can't reproduce it from the code you posted. – DanielKO Aug 06 '12 at 19:16
  • I tried to reproduce the problem with minimal code, but failed. This is actually part of a quite large simulator and it seems to occur only when it was run with pin tool. If you are still willing to try, I can give you the url for downloading the codes and instructions to reproduce the problem. But I am afraid that would be too much to ask.. Thanks :) – ray Aug 06 '12 at 19:25
  • 1
    Then Michael Burr's hypothesis of memory corruption seems more likely. Try running it under Valgrind. – DanielKO Aug 06 '12 at 22:37
  • It's not a compiler bug. It's just your code is broken. See, for example, [this GCC bug report](http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21809) or [this one](http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53095). (If you can replicate the problem with non-broken floating point code, then I'd move on to the memory corruption hypothesis.) – David Schwartz Aug 08 '12 at 08:49
  • @David `x==a` should evaluate to the same value no matter how many times you call it – David Heffernan Aug 08 '12 at 08:52
  • @DavidHeffernan: Nope. The result will depend on the precision to which `x` is kept, which can depend upon where it's stored. If the first comparison has extra precision and the second doesn't, the results will be different. (Read all of [this bug report](http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21809).) – David Schwartz Aug 08 '12 at 09:11
  • @David I guess I don't understand you. You seem to be saying that `b1 = x==a; b2 = x==a;` could result in `b1` and `b2` having different values. I believe that cannot happen. – David Heffernan Aug 08 '12 at 09:14
  • @DavidHeffernan: That's correct, that's what I'm saying. Yes, it can happen. If `x` and `a` differ by less than the guaranteed precision of the type, then they may compare equal or unequal, depending on the precision of the comparison used. The compiler is free to choose the fastest comparison in each situation, and they can be different. (For example, maybe `b1` fits in a register but `b2` is a write to memory. So the cases may differ. In the OP's case, the write to `cout` likely flushes some things to memory, losing precision.) – David Schwartz Aug 08 '12 at 09:16
  • @David But that's not actually what is happening. The compiler is generating the same code for each comparison. Or am I wrong with that too? – David Heffernan Aug 08 '12 at 09:18
  • @DavidHeffernan: There are intervening function calls which likely flush values to and from memory, causing a precision loss. – David Schwartz Aug 08 '12 at 09:19
  • @DavidSchwartz I doubt that. What's actually happening is that we have a FPU register stack overflow. – David Heffernan Aug 08 '12 at 09:22
  • Does it really happen randomly, or is the effect always there in say iteration #56789? If so, place a breakpoint on the first if and investigate how the value of _cycles changes. BTW strictly speaking isn't _cycles a disallowed variable name in C++ anymore? Names starting with an underscores used to be reserved in C++ although noone ever seemed to care... – Axel Aug 08 '12 at 10:19
  • @Axel: The identifier `_cycles` is not reserved - an identifier that starts with a single underscore would be reserved if it had an uppercase letter after the underscore. Also, if it were at 'file scope' it would be reserved, but it's a class member. See this answer for more details: http://stackoverflow.com/a/228797/12711 – Michael Burr Aug 08 '12 at 10:36
  • @Michael Ok. I adopted the rule of generally avoiding identifiers starting eith underscores long time ago, so I seem to have forgotten the restriction to file scope. – Axel Aug 08 '12 at 11:27
  • @David Schwartz: I've read most of the comments on the GCC bug report, and even David Monniaux's paper. I don't see why this code is in any way broken. From the linked sources, the issue clearly is on compiler's and hardware's courts. GCC can be excused in this case because the problem with x87 is un-fixable without software fp (like in the pre-x87 days.) – DanielKO Aug 09 '12 at 19:00