17

I've been working on optimizing code that analyzes social graph data (with lots of help from https://blog.golang.org/profiling-go-programs) and I've successfully reworked a lot of slow code.

All data is loaded into memory from db first, and the data analysis from there appears CPU bound (max memory consumption < 10MB, CPU1 @ 100%)

But now most of my program's time seems to be in runtime.osyield and runtime.usleep. What's the way to prevent that?

I've set GOMAXPROCS=1 and the code does not spawn any goroutines (other than what the golang libraries may call).

This is my top10 output from pprof

(pprof) top10
62550ms of 72360ms total (86.44%)
Dropped 208 nodes (cum <= 361.80ms)
Showing top 10 nodes out of 77 (cum >= 1040ms)
      flat  flat%   sum%        cum   cum%
   20760ms 28.69% 28.69%    20850ms 28.81%  runtime.osyield
   14070ms 19.44% 48.13%    14080ms 19.46%  runtime.usleep
   11740ms 16.22% 64.36%    23100ms 31.92%  _/C_/code/sc_proto/cloudgraph.(*Graph).LeafProb
    6170ms  8.53% 72.89%     6170ms  8.53%  runtime.memmove
    4740ms  6.55% 79.44%    10660ms 14.73%  runtime.typedslicecopy
    2040ms  2.82% 82.26%     2040ms  2.82%  _/C_/code/sc_proto.mAvg
     890ms  1.23% 83.49%     1590ms  2.20%  runtime.scanobject
     770ms  1.06% 84.55%     1420ms  1.96%  runtime.mallocgc
     760ms  1.05% 85.60%      760ms  1.05%  runtime.heapBitsForObject
     610ms  0.84% 86.44%     1040ms  1.44%  _/C_/code/sc_proto/cloudgraph.(*Node).DeepestChildren
(pprof)

The _ /C_/code/sc_proto/* functions are my code.

And the output from web: output from web

(better, SVG version of graph here: https://goo.gl/Tyc6X4)

Aaron
  • 666
  • 1
  • 7
  • 23
  • 1
    Your graphs are illegible. You don't tell us what your program is doing. [How to create a Minimal, Complete, and Verifiable example.](http://stackoverflow.com/help/mcve). – peterSO Feb 02 '16 at 14:09
  • @JimB not using cgo; everything pure Go – Aaron Feb 02 '16 at 14:12
  • 4
    I would guess that your program is blocked, and that why it yields the CPU so frequently. You could run a blocking profile, but without actual code and data there's much we can do here. – JimB Feb 02 '16 at 14:18
  • @JimB definitely blocked (which is pretty clear by calling yield & sleep) but difficult to reconsile with the 100% utilization on main cpu. Seems like multiple goroutines are running then? – Aaron Feb 02 '16 at 18:01
  • 1
    @Aaron: there are *always* multiple goroutines running (and multiple OS threads). GOMAXPROCS only controls how many OS threads are available to run user level code. – JimB Feb 02 '16 at 18:04

1 Answers1

12

Found the answer myself, so I'm posting this here for anyone else who is having a similar problem. And special thanks to @JimB for sending me down the right path.

As can be seen from the graph, the paths which lead to osyield and usleep are garbage collection routines. This program was using a linked list which generated a lot of pointers, which created a lot of work for the gc, which occasionally blocked execution of my code while it cleaned up my mess.

Ultimately the solution to this problem came from https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs (which was an awesome resource btw). I followed the instructions about the memory profiler there; and the recommendation to replace collections of pointers with slices cleared up my garbage collection issues, and my code is much faster now!

Aaron
  • 666
  • 1
  • 7
  • 23
  • Which SQL driver? (BTW, I also find that the cumulative output is helpful, since that shows the time in function calls more like the call stack) – JimB Feb 02 '16 at 18:11
  • The driver is the Go-SQL-driver at: https://github.com/go-sql-driver/mysql I came to the conclusion that my database driver was the thread which was yielding because under the output at http://localhost:6060/debug/pprof/goroutine?debug=1 I see `1 @ 0x43079c 0x43086b 0x40487d 0x4043bb 0x5451ec 0x45f0e1 # 0x5451ec database/sql.(*DB).connectionOpener+0x4c C:/Go/src/database/sql/sql.go:727` along with my goroutine (next comment) – Aaron Feb 02 '16 at 19:54
  • `1 @ 0x441d10 0x53447a 0x48ba79 0x48d7f7 0x48cfe5 0x49b7da 0x48dc02 0x49a8db 0x480e9f 0x45f0e1 # 0x53447a _/C_/code/sc_proto/cloudgraph.(*Node).AddChild+0x8a C:/code/sc_proto/cloudgraph/cloudgraph.go:92 # 0x49b7da _/C_/code/sc_proto.main.func3+0x7a C:/code/sc_proto/main.go:887 # 0x48dc02 _/C_/code/sc_proto.main+0x1d2 C:/code/sc_proto/main.go:908 # 0x49a8db _/C_/code/sc_proto.Test_main+0x1b C:/code/sc_proto/main_test.go:1002 # 0x480e9f testing.tRunner+0x9f C:/Go/src/testing/testing.go:473` – Aaron Feb 02 '16 at 19:55
  • I'm not sure what you're inferring from that, but the existence of a goroutine doesn't mean it's consuming any CPU. – JimB Feb 02 '16 at 20:11
  • 2
    looks like that blog post link no longer works – user5359531 Jun 14 '22 at 17:05
  • 1
    looks like that blog post can be viewed here https://web.archive.org/web/20140801000350/https://software.intel.com/en-us/blogs/2014/05/10/debugging-performance-issues-in-go-programs – user5359531 Jan 29 '23 at 00:35