4

I have a large C++ project under Visual Studio 2010 v10.0.40219.1 SP1Rel which I started seeing a bug with on one of our regression tests. When I checked the bug on a dev machine, I couldn't get it to happen, so I copied the exes from the test machine to the dev machine and the bug appeared. I then deleted the source tree including projects from the test machine, copied them from the dev machine, cleaned and rebuilt them on the test machine, and the bug was still there. So basically, the executables built from the same project and configuration on my dev PC, built with the same compiler version and installed hotfixes, differ from those built on the test PC. The only difference is that the dev PC is running Windows 7 64, and the test PC is running XP. I've also checked all linked LIBs and DLLs are the same on both build platforms.

If the same executables produced different results on different PCs, I'd guess it was a platform specific bug of some kind in my code, but the same executables behave consistently on different PCs, just that those compiled on XP behave differently to those compiled on W7 64.

Any ideas why this should be?

Edit Rebuilt the code in a debug configuration on the test machine, and the bug goes away. Currently copying the source trees and tools onto blank XP and W7 pcs, to see if the issue does follow the build platform, or is specific to one of the PCs currently being used.

Edit2 Copied the source tree onto two new PCs, one XP, one Windows7 and rebuilt. Exe exhibits bug only when built on XP release. Doesn't occur on XP debug build, only optimised release build. Win 7 build runs fine on XP and Win 7, XP build shows bug on XP and Win 7. Going to spend a bit of time trying to better isolate the bug to figure out what exactly is happening, but there certainly seems to be compilation differences based on the build platform.

Edit3 Problem was indeed an uninitialised variable, or more precisely an uninitialized struct in a template, something close to;

template<class TYPE>class MyTemplateClass 
{
public:
  assign(TYPE &x) { x = t; }

  TYPE t;
}

Warning level 4 doesn't seem to pick this up.

SmacL
  • 22,555
  • 12
  • 95
  • 149
  • Yes statically linking MFC, and also Stingray GUI libs. There are some 3rd party libs and DLLs but they're the same versions on both platforms. I'm off to investigate the issue on the test PC, to see if I can figure what exactly is happening. It is a problem as we usually build the release based on the test PC build. – SmacL Dec 20 '13 at 17:11
  • Sorry, I yanked that question because I reread the question and saw it already answered. – OmnipotentEntity Dec 20 '13 at 17:11
  • There's a few possibilities, I don't think it's something as obvious as a 32-bit vs 64-bit issue, especially because you're probably compiling a 32-bit executable on W7 64 for use on the XP box. It could be differing versions of MFC, or even something more weird. Do you have a minimal test case? – OmnipotentEntity Dec 20 '13 at 17:13
  • Unfortunately, no minimal test case as yet. I'm going to build a debug version on the XP test PC in the hope of generating one. – SmacL Dec 20 '13 at 17:34
  • Did you build on the dev machine and then test on the test machine? How do you know it is not something specific to the test machine such as a different dll version? Have you looked through the loaded module list and compared the versions on dev & test machines? – Pete Dec 20 '13 at 18:04
  • @Pete, the dev PC build works fine on both the test PC and the dev PC. The test PC build exhibits the same bug on both the test PC and the dev PC. – SmacL Dec 20 '13 at 18:13
  • (pssst, what's the bug?) – IdeaHat Dec 20 '13 at 18:30
  • @MadScienceDreams, extra point appearing in a graph representing a section of ground. In code, templates buried in templates, buried in more templates, derived from other templates. Beer time, methinks. Will look at it again tomorrow. – SmacL Dec 20 '13 at 20:02
  • Sounds like an uninitialized variable, given that the code works as expected in debug mode. The difference between XP and Windows 7 could also be explained by this as well. – ChrisF Dec 21 '13 at 19:16
  • @ChrisF, you could well be right, but binaries generated by the two platforms are different. Bug only happens on XP generated binary. The compiled code for a bug should be the same, regardless of whether you compile / link on XP or Win7. – SmacL Dec 22 '13 at 08:54

1 Answers1

1

Have you compared the binary executables? This can be done on the command line with fc /b file1 file2 (see here). This would at least verify if you're producing the same program, and not experiencing something odd outside of the executable produced.

There are probably/possibly some static libraries being compiled in, and there could be other .h files that aren't part of your code. It's always possible that these have bugs. Make sure your libraries and SDK's are the same versions, and not just the compilers. If possible, try updating them.

If bugs are going away with different compiler flags, then your program working/failing might be the result of some undefined behavior (e.g. uninitialized variable). Make sure all of the compiler's warning flags are enabled to help find these. Unfortunately, Visual Studio isn't the best about error and warning messages, but they still help.

jbo5112
  • 824
  • 1
  • 11
  • 18
  • Binaries are different, on a 15mb exe, the binary generated on XP is about 500 bytes bigger, and FC spews out lots of differences from offset x128 onwards. – SmacL Dec 22 '13 at 08:52
  • All compiler settings should be identical, as the same project file is being used. Headers should also be the same as I did a full install of the entire toolchain on a blank XP and WIN7 boxes yesterday and got the same result. I'll do a dependency check on the exes and see what that shows. – SmacL Dec 22 '13 at 08:58
  • Static dependency walk of both versions gives identical results, but says 'At least one delay-load dependency module was not found' and 'At least one module has an unresolved import due to a missing export function in a delay-load dependent module.' I'll try a profiled dynamic walk. – SmacL Dec 22 '13 at 09:07
  • And the bug disappears when profiling the release build under dependency walker. Rebuilding all with -w4 shows no uninitialized variables, but there could easily be some hidden mallocs providing uninitialized memory. More work required. – SmacL Dec 22 '13 at 10:04
  • @ShaneMacLaughlin There are some code/program analysis tools out there that could be of use. I have only tried valgrind for finding memory leaks, which Oracle DB libraries made completely unusable. I'm sure you can find others with Google, and there is a thread on it here http://stackoverflow.com/questions/93260/a-free-tool-to-check-c-c-source-code-against-a-set-of-coding-standards . Is there a computer where you can try VS 2012, 2013 or possibly another compiler? – jbo5112 Dec 22 '13 at 23:40
  • @ShaneMacLaughlin I personally usually try some debug output around some of your looping/recursion code to track down errors, as I usually have a really good grasp of my code. – jbo5112 Dec 22 '13 at 23:41
  • Use ApplicationVerifier on Windows – paulm Dec 23 '13 at 13:21