Extremely slow in debug mode?

Question

Dont have much experiences in Linux, I have some well-tested code, get no compliation error/warns from MSVC, ICC, run flawlessly in windows platform as well.

I then copied these codes to my newly installed linux system (Ubuntu 13.10 with GCC 4.8.1) then I installed the latest version of eclipse (3.8.1 or so) with CDT and configured it with the system's GCC compiler.

Eclipse CDT/GCC works fine with all the trivial test codes I wrote, I then ask GCC to compile the large piece of well-tested code from Windows.

GCC compiled it with NO errors and NO warns, and program run fine with trivial workloads, however, as soon as I give the program some real-world payload, it basically freezen, takes forever to complete (when in Windows, the same program can finish the "real-world" payload in a matter of seconds).

Can anyone tell me what should I looking for to fix this or GCC is just that slow in debug mode (I mean, at least 2-3 magnitudes slower than ICC/MSVC's debug mode), thanks.

The complation optimization level is set to be O3 for release mode, and the default setting (no optimization, or at least thats what I believe) for debug mode.

The problem is, My feeling is ICC/MSVC's debug-mode (with no optimization) binary is much faster (I mean, something like 1000 times faster) than GCC's one.

UPDATE: (At the moment it seems I cannot make comment at stackoverflow, so I have to put replies here, sorry):

ams: Well, I wait a few minutes, see the program still runs I just abort it, so I dont know whether the program can finish or not. However, as long as the payload is very small, it can finish it normally.

As for the bottleneck of the code, well, I think its memory-bounded.

The majority part of time (80+%) the code is doing some radix-sort with large input data array, the optimized radix sort code I wrote can sort 200-300 million 32 bit floating point values per second in Windows, but with the same hardware, in linux it seems it will take hours if not forever to sort some 10 million length data array.

UPDATE:

Thank for all you people's help, I figured out the problem lies in some macro I messed up with linux, now everything works fine.

What kind of optimization flags are you using with ICC and GCC? What part has eclipse in this? What system calls are your program using? — iveqy, Nov 18 '13 at 12:25
Are you saying it *never* completes, or are you saying it does complete, but takes a long time? — ams, Nov 18 '13 at 12:34
Use a profiler. By the way, does your program access the network for any reason? — n. m. could be an AI, Nov 18 '13 at 12:35
Run the program in a debugger like `gdb` and interrupt it from time to time to see where it does spend most of the time. A even better method would be to use a profiler like `valgrind`. — scai, Nov 18 '13 at 12:35
(1) I would make sure that the program really is just slow and can actually finish the task. In other words, it is not in an infinite loop due to some undefined behavior that does not manifest itself on Windows. (2) I would use the `perf` profiler to track down the bottleneck. — Ali, Nov 18 '13 at 12:52
@ams actually I dont know, you can say the program hangs there, I just wait a few mins and still see no results, so I just terminate it. — user0002128, Nov 18 '13 at 12:52
Add logging to your program or enable it to see where it "hangs" — Aaron Digulla, Nov 18 '13 at 12:53
@Ali, I doubt that, if there were some undefined behaviors involved (1)GCC should at least give some warns (2) Its likely to hang even with small payload (3) ICC / MSVC should give some complains. — user0002128, Nov 18 '13 at 12:55
@user0002128 In any case, I would make sure that the program actually finishes and really is just slow. — Ali, Nov 18 '13 at 12:56
@user0002128 No, your three assumptions are not true. Read about *undefined behavior* in C/C++. GCC doesn't have to give a warning (it might not care at all, or you haven't enabled additional flags like `-Wall` or `-Wextra` which would trigger these warnings), small payloads may lead to different results (because the behavior is *undefined*), and likewise ICC / MSVC don't have to complain and may work without problems (because the behavior is *undefined*). You cannot make any assumptions about undefined behavior. — scai, Nov 18 '13 at 13:10

score 1 · Answer 1 · answered Nov 18 '13 at 12:56

1

You have three options:

Enable logging to see what the code does. If you don't have logging, add it.
Use a profiler to collect information about where the time is spent. Have a look at valgrind. I'm a bit worried about that your app never terminates; not sure whether valgrind can handle this.
Use a debugger and interrupt your program after a while to see what it does. Repeat until you see a pattern

answered Nov 18 '13 at 12:56

Aaron Digulla

321,842
108
597
820

Yeah, most profilers don't do anything useful if the program never exits. I'd go the debugger route. – ams Nov 18 '13 at 13:27

score 1 · Answer 2 · edited May 23 '17 at 12:28

1

I agree with those who say run it under a debugger like gdb and interrupt it with Ctrl-C (in the program's output window). Then in gdb say thread 1 to get it into the active thread. Then say bt to get a stack trace. Then examine every line of code by typing up and down (and data, if necessary, by typing p variablename) on the stack so you understand exactly why it was doing what it was doing at the time you interrupted it.

Since you know it takes extremely longer than it should, that means the chance you caught it in the act of misbehaving is extremely close to certain.

(Note: there is no need to 1. treat this as a measurement problem, or 2. try to distinguish between an infinite loop or simply long-running.)

That's the random-pausing technique.

edited May 23 '17 at 12:28

Community

1
1

answered Nov 18 '13 at 14:37

Mike Dunlavey

40,059
14
91
135

The difference between an infinite loop (or block) and a long-running job is the difference between a logic error and an optimization issue. It also determines whether a profiler is interesting or not. I'd go the random pausing route first though. – ams Nov 18 '13 at 15:24
@ams: Whether a program that should take 1 second runs for only a year or forever doesn't seem to me like a huge difference. But that's just me :) – Mike Dunlavey Nov 18 '13 at 19:09
LOL, by long-running I was thinking more like 5 minutes. A year would be spectacularly poor optimization, if it can achieve that without actually being broken. – ams Nov 19 '13 at 10:24

score 0 · Answer 3 · answered Nov 20 '13 at 11:53

0

In addition to all the other advice I'd like to mention that GCC offers a new compiler flag -Og that's supposed to improve the performance of debug builds.

answered Nov 20 '13 at 11:53

pentadecagon

4,717
2
18
26

score 0 · Answer 4 · answered Jan 30 '18 at 11:41

Did you use "top" or other command/tooling to monitor the runtime (distribution of) system resources.... to get more insight in the memory consumption and CPU-load during processing?

By using a debugger such as "gdb" (which works fine on Linux) you can add (and remove) conditional and unconditional breakpoints and step through the code (after building the debug executable). I use "gdb" as integrated in Eclipse, which provides you with a number of views on the runtime process (such as buttons and a console window). You can also add additional test code print statements, which is by default forwarded to your console window....

Using the debugger, it is possible to determine where, in which subprocess, so much time/resources are consumed. After determining that, you can focus on solving that particular issue/irregularity/bug/resource-eater.

Did you also run your executable in NON-debug mode ("run" instead of "debug")? If yes, you can compare the relative "performances".

Extremely slow in debug mode?

4 Answers4