6

I'm sure everyone knowing golang knows that blog post here.

Reading it again, I wondered if using gccgo instead of go build would increase the speed a bit more. In my typical use case (scientific computations), a gccgo-generated binary is always faster than a go build-generated one.

So, just grab this file: havlak6.go and compile it:

go build havlak6.go -O havlak6_go
gccgo -o havlak6_gccgo -march=native -Ofast havlak6.go

Surprise !

$/usr/bin/time ./havlak6_go
5.45user 0.06system 0:05.54elapsed 99%CPU

$/usr/bin/time ./havlak6_gccgo
11.38user 0.16system 0:11.74elapsed 98%CPU

I'm curious and want to know why an "optimizing" compiler does produce slower code.

I tried to use gprof on gccgo generated binary:

gccgo -pg -march=native -Ofast havlak6.go
./a.out
gprof a.out gmon.out

with no luck:

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

As you can see the code has not been actually profiled.

Of course, I read this, but as you can see, the program takes 10+ seconds to execute... The number of samples should be > 1000.

I also tried:

rm a.out gmon.out
LDFLAGS='-g -pg' gccgo -g -pg -march=native -Ofast havlak6.go
./a.out
gprof

No success neither.

Do you know what's wrong? Do you have an idea of why gccgo, with all its optimization routines fails to be faster than gc in this case?

go version: 1.0.2 gcc version: 4.7.2

EDIT:

Oh, I completely forgot to mention... I obviously tried pprof on the gccgo-generated binary... Here is a top10:

Welcome to pprof!  For help, type 'help'.
(pprof) top10
Total: 1143 samples
    1143 100.0% 100.0%     1143 100.0% 0x00007fbfb04cf1f4
       0   0.0% 100.0%      890  77.9% 0x00007fbfaf81101e
       0   0.0% 100.0%        4   0.3% 0x00007fbfaf8deb64
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2faf
       0   0.0% 100.0%        3   0.3% 0x00007fbfaf8f2fc5
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fc9
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fd6
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f2fdf
       0   0.0% 100.0%        2   0.2% 0x00007fbfaf8f4a2f
       0   0.0% 100.0%        1   0.1% 0x00007fbfaf8f4a33

And that's why I'm looking for something else.

EDIT2:

Since it seems that someone wants my question to be closed, I did not try to use gprof out of the blue: https://groups.google.com/d/msg/golang-nuts/1xESoT5Xcd0/bpMvxQeJguMJ

Community
  • 1
  • 1
  • People still [*believe in gprof as the canonical profiler.*](http://stackoverflow.com/a/1779343/23771). Some points: 1) gprof is only useful for CPU-bound programs with shallow call stacks, without recursion, for which it has all symbols. 2) Compiler optimization only makes a difference in tight inner loops or routines called a lot, in your code, that don't themselves call functions (like memory allocation, etc). Compiler optimization doesn't just make everything go faster. – Mike Dunlavey Mar 04 '13 at 17:16
  • Yes, I got it for gprof. And I do agree with you about the compiler optimizations. However, I would not expect worse performances either with an optimisation-capable compiler. Performances should be equal or better. If not, there is room for improvement and I'd like to understand why :) –  Mar 04 '13 at 23:53
  • The only timing I ever do is end-to-end, possibly repeated 10^n times and divided by that, and I don't look for more than 3 digits accuracy. There's noise and I don't care. Then I use random pausing to look for ways to make it faster. Unless it's already been squeezed like a sponge, I will find ways, and then I can do it all over again. When after several cycles I hit diminishing returns, and the pc is most often in my generated instructions, then I turn on the optimizer, which makes it maybe 10% faster. Whoopee. – Mike Dunlavey Mar 05 '13 at 02:00

2 Answers2

2

Running the gccgo-generated binary under Valgrind seems to indicate that gccgo has an inefficient memory allocator. This may be one of the reasons why gccgo 4.7.2 is slower than go 1.0.2. It is impossible to run a binary generated by go 1.0.2 under Valgrind, so it is hard to confirm for a fact whether memory allocation is gccgo's primary performance problem in this case.

  • Thanks for mentioning `Valgrind`. That's the very first time I dig into profiling and I though gprof was the profiler... I was wrong :) However it seems that `Valgrind` is a C-only profiler/profiling framework. It complains about uninitialised values and does not seem to "get" go at all... Could you elaborate a bit? –  Feb 26 '13 at 12:49
  • I used `valgrind --tool=callgrind` and KCacheGrind to examine the behavior of the gccgo-generated code. Valgrind's callgrind is also able to run many non-C codes, but unfortunately it is making assumptions which are violated by go1.0.2-generated binaries. https://code.google.com/p/go/issues/detail?id=782 –  Feb 26 '13 at 16:00
0

Remember go build also defaults to static linking so for an apples to apples comparison you should give gccgo the -static or -static-libgo option.