0

I am using gprof to optimize my C++ code, and I am obtaining the following results:

Flat profile:

Each sample counts as 0.01 seconds.
 %    cumulative  self               self     total
time   seconds   seconds    calls    s/call   s/call  name    
21.49      2.31     2.31                              _fu2___ZSt4cout
12.93      3.70     1.39   1560037     0.00     0.00  __gnu_cxx::new_allocator<DataINSPVAS>::construct(DataINSPVAS*, DataINSPVAS const&)
 8.56      4.62     0.92  30267700     0.00     0.00  __gnu_cxx::new_allocator<AntennaData>::construct(AntennaData*, AntennaData const&)
 6.14      5.28     0.66 261159927     0.00     0.00  __gnu_cxx::__normal_iterator<char*, std::string>::__normal_iterator(char* const&)
 5.40      5.86     0.58 149234244     0.00     0.00  bool __gnu_cxx::operator!=<char*, std::string>(__gnu_cxx::__normal_iterator<char*, std::string> const&, __gnu_cxx::__normal_iterator<char*, std::string> const&) ...

According to this flat profile, the function fu2__ZSt4cout is using 21.49% of the running time. Does anybody knows what fu2__ZSt4cout stands for?

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
Jaime Ivan Cervantes
  • 3,579
  • 1
  • 40
  • 38
  • 2
    Probably a mangled `cout` – Jesus Ramos Jan 28 '14 at 22:28
  • My first thought was that it probably has to do with "couts", but there are very few of these in my code and are used only at the beginning and the end. – Jaime Ivan Cervantes Jan 28 '14 at 22:32
  • @JesusRamos: given that `std::cout` is an object rather than a function this seems to be odd... However, `__ZSt4cout` does demangle to `std::cout` according to `c++filt`. What the `fu2` add I don't know (with it, `c++filt` doesn't demangle the symbol). – Dietmar Kühl Jan 28 '14 at 22:32
  • @DietmarKühl That's true. Sort of made sense though since usually console output is kind of slow and can account for high CPU if you use it too much. – Jesus Ramos Jan 28 '14 at 22:34
  • @DietmarKühl Yeah I'm not sure about the f2 but the zst4 part did seem familiar. – Jesus Ramos Jan 28 '14 at 22:35
  • 2
    @JaimeCervantes: how long does the program run? If it is very short I could imagine that it is related to construction/destruction of the I/O subsystem. – Dietmar Kühl Jan 28 '14 at 22:35
  • @DietmarKühl: it runs for about 20 minutes – Jaime Ivan Cervantes Jan 28 '14 at 22:37
  • If your program is I/O intensive this could be possible (since very little time is spent on the CPU other than processing some data) so `cout` would be eating quite a bit of cpu time with all the data copies going on. – Jesus Ramos Jan 28 '14 at 22:47
  • @JesusRamos: My code has multithreading and my hypothesis was that write() was going to take most of the time because the program is constantly writing binary data. Is write() somehow related to cout? – Jaime Ivan Cervantes Jan 28 '14 at 22:51
  • Are you using C library `write` or the C++ equivalent? – Jesus Ramos Jan 28 '14 at 22:56
  • Can't find this name in the titanium ABI. Which compiler/OS are you using? – ysdx Jan 28 '14 at 22:57
  • @JesusRamos: I'm using the c++ std write() function for binary data – Jaime Ivan Cervantes Jan 28 '14 at 22:59
  • @ysdx: I am using MingW 32 bits g++ – Jaime Ivan Cervantes Jan 28 '14 at 23:00
  • 1
    Well writes usually happen in in memory buffers, not sure if there is an optimization to wrap it in some form of cout in C++. Did you try commenting out the lines that do call `cout` and see if it changes? – Jesus Ramos Jan 28 '14 at 23:02
  • 1
    @JaimeCervantes: when you say "std write()" do you mean `std::ostream::write()`? If so, it wouldn't really be related to `std::cout` but to the IOStream subsystem. You might want to make sure that you are using the `C` locale. Make sure you _don't_ keep flushing the stream by accident, e.g., using `std::unitbuf`, `std::flush`, etc. – Dietmar Kühl Jan 28 '14 at 23:03
  • 1
    I believe `_fu2___ZSt4cout` is including `main`. Possibly you're missing debug information? – Collin Dauphinee Jan 28 '14 at 23:10

1 Answers1

1

(Quick point: There as so many questions on SO like this.)

First, gprof is a "CPU profiler". That means during IO or any other blocking syscall it is Shut Off. Your program could run for 100 seconds, spending 99 seconds on IO, and gprof will act as though it only spent 1 second.

Second, you're looking at self time. Self time is useless, except for functions that crunch a lot without calling subfunctions. So if you've got a bubble sort of an integer array, and you're spending a large fraction of overall time in it, gprof will show it as a bottleneck. Change it to a sort on strings, where comparison requires calling a function, and gprof will show a big % in strcmp, which is not where the problem is at all.

From looking at your output, my guess is that your program mostly does IO, so it doesn't surprise me if, of the small amount of time it spends actually running, a large part of that is going into and coming out of library IO routines. You're also showing a lot of self time in new and an iterator. Not surprising at all.

If you're looking for a profiler, you want one that samples the entire call stack, on wall-clock time (not CPU time), and reports the percent of time each line of code appears on those stacks. One such profiler is Zoom. (BTW, don't fall for the line that you need high-frequency samples, to get "accuracy". If you get 1000 samples, that's way more than enough, to see what's taking time.)

When I'm tuning performance, I use this method.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135