3

I have a Go binary that I'm trying to profile, and I'm getting surprising results. The code has the following (truncated) in main.go, and the rest of the code is in the package monte:

package main

import (
  "monte"
  "runtime/pprof"
)

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")

func main() {
  flag.Parse()
  if *cpuprofile != "" {
    f, err := os.Create(*cpuprofile)
    if err != nil {
      log.Fatal(err)
    }
    pprof.StartCPUProfile(f)
  }

  monte.ExpensiveOperation()

  pprof.StopCPUProfile()
}

I build my executable with go build src/main.go, then run it with ./main -cpuprofile=monte.prof. When I examine the output with go tool pprof main monte.prof, I get the following output:

(pprof) top10 --cum
Total: 346 samples
     280  80.9%  80.9%      343  99.1% time.Time.Format
       0   0.0%  80.9%      341  98.6% runtime.interhash
       0   0.0%  80.9%      341  98.6% runtime.main
       0   0.0%  80.9%      251  72.5% strconv.Unquote
      13   3.8%  84.7%       31   9.0% strconv.roundShortest
      11   3.2%  87.9%       18   5.2% strconv.fmtE
       9   2.6%  90.5%        9   2.6% runtime.markallocated
       1   0.3%  90.8%        8   2.3% math/rand.Float64
       2   0.6%  91.3%        8   2.3% runtime.FixAlloc_Free
       7   2.0%  93.4%        8   2.3% time.nextStdChunk

The function that has the largest cumulative time is time.Time.Format, which seems wrong to me (shouldn't it be main?) and there's no mention of monte at all, despite the fact that the "expensive operation" takes about 10 seconds to complete, plenty of time for the sampler to see it. If I pass the --focus=monte flag to go tool pprof, no samples are shown at all. I assume I'm missing some flag somewhere; does anyone have any ideas? Thanks!

Haldean Brown
  • 12,411
  • 5
  • 43
  • 58

2 Answers2

2

Your main package doesn't compile and you don't provide source code for your monte package. Therefore, we can't reproduce your results. which is the first step in debugging.

Here's some working source code and results.

package monte

func ExpensiveOperation() {
    var t float64
    for i := int64(0); i < 1e10; i++ {
        t += 1
    }
}

.

package main

import (
    "flag"
    "log"
    "monte"
    "os"
    "runtime/pprof"
)

var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")

func main() {
    flag.Parse()
    if *cpuprofile != "" {
        f, err := os.Create(*cpuprofile)
        if err != nil {
            log.Fatal(err)
        }
        pprof.StartCPUProfile(f)
    }

    monte.ExpensiveOperation()

    pprof.StopCPUProfile()
}

.

(pprof) top10 --cum
Total: 1166 samples
       0   0.0%   0.0%     1166 100.0% main.main
    1166 100.0% 100.0%     1166 100.0% monte.ExpensiveOperation
       0   0.0% 100.0%     1166 100.0% runtime.main
       0   0.0% 100.0%     1166 100.0% schedunlock

UPDATE:

The code at github.com/haldean/monte doesn't provide meaningful results. It only takes time real 0m0.506s and only takes 48 samples.

(pprof) top10 --cum
Total: 48 samples
       0   0.0%   0.0%       45  93.8% main.main
       0   0.0%   0.0%       45  93.8% monte.(*Scene).Evaluate
       0   0.0%   0.0%       45  93.8% monte.(*Scene).Render
       0   0.0%   0.0%       45  93.8% monte.(*Scene).SetColor
       0   0.0%   0.0%       45  93.8% runtime.main
       0   0.0%   0.0%       45  93.8% schedunlock
       0   0.0%   0.0%       44  91.7% monte.(*Scene).RayCast
       4   8.3%   8.3%       31  64.6% runtime.new
      13  27.1%  35.4%       27  56.2% runtime.mallocgc
       3   6.2%  41.7%       26  54.2% monte.(*Scene).DirectionAt
peterSO
  • 158,998
  • 31
  • 281
  • 276
  • That's interesting, on my machine it takes around 6 seconds of wall time and I don't get anything about monte in the output. Maybe my environment is configured strangely somehow; maybe I should just try on a different machine. Thanks so much for your help! – Haldean Brown Oct 09 '12 at 03:21
2

It looks like a CPU-only profiler, so if your ExpensiveOperation takes time by doing I/O or sleep or something like that, it will be invisible. (That's the problem with "cpu" profilers.)

In terms of what the numbers mean, it looks like there were 346 total samples. It's not too surprising that the numbers are a bit squishy, depending on how the profiler works, but if it were a true stack sampler the numbers would mean this:

341/346 samples had main and interhash on the stack. You would expect all samples to have main on the stack, but that's the squishy part.

343/346 samples had Format on the stack. (Why there's more of those than main, who knows?)

251/346 samples had Unquote on the stack. So of those 251 samples, they also probably had main, interhash, and Format on the stack.

With this kind of detective work, you can slowly begin to piece together what the samples are telling you.

Of course, if you could actually see the stack samples, you wouldn't have to see very many of them before you would know exactly what's going on.

More about all that.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • It's very much a CPU-expensive operation; it's a raytracer, so it's spending most of its time doing (really unoptimized) geometry on stuff in memory. The strange thing is that I don't call many of the things on that sample list. I never call Time.Format, for example. – Haldean Brown Oct 08 '12 at 22:49
  • 1
    @Will: Then rather than play 20 questions with the profiler, I would just do what works. Run it under GDB, hit Ctrl-C, and do `bt` to show the stack trace. (You may have to switch to the worker thread first.) Repeat 10 times. That will show you what's going on, pure and simple. – Mike Dunlavey Oct 09 '12 at 00:09