Why doesn't panic show all running goroutines?

Question

Page 253 of The Go Programming Language states:

... if instead of returning from main in the event of cancellation, we execute a call to panic, then the runtime will dump the stack of every goroutine in the program.

This code deliberately leaks a goroutine by waiting on a channel that never has anything to receive:

package main

import (
    "fmt"
    "time"
)

func main() {
    never := make(chan struct{})
    go func() {
        defer fmt.Println("End of child")
        <-never
    }()
    time.Sleep(10 * time.Second)
    panic("End of main")
}

However, the runtime only lists the main goroutine when panic is called:

panic: End of main

goroutine 1 [running]:
main.main()
    /home/simon/panic/main.go:15 +0x7f
exit status 2

If I press Ctrl-\ to send SIGQUIT during the ten seconds before main panics, I do see the child goroutine listed in the output:

goroutine 1 [sleep]:
time.Sleep(0x2540be400)
    /usr/lib/go-1.17/src/runtime/time.go:193 +0x12e
main.main()
    /home/simon/panic/main.go:14 +0x6c

goroutine 18 [chan receive]:
main.main.func1()
    /home/simon/panic/main.go:12 +0x76
created by main.main
    /home/simon/panic/main.go:10 +0x5d

I thought maybe the channel was getting closed as panic runs (which still wouldn't guarantee the deferred fmt.Println had time to execute), but I get the same behaviour if the child goroutine does a time.Sleep instead of waiting on a channel.

I know there are ways to dump goroutine stacktraces myself, but my question is why doesn't panic behave as described in the book? The language spec only says that a panic will terminate the program, so is the book simply describing implementation-dependent behaviour?

Read about the `GOTRACEBACK` env. variable in https://pkg.go.dev/runtime#hdr-Environment_Variables Basically the idea is that in a real-world Go service there might exist hundreds of thousands goroutines active, and dumping them all is usually pointless for a panic which typically happens due to things like NPE or indexing errors, which are _usually_ localized to the goroutine which has happened to hit such an error. — kostix, Mar 05 '22 at 11:42
Dumping the stacks of all the running goroutines are useful when you discover that the service is experiencing a subtle deadlock (not that which is detected by the Go runtime itself). For such cases it's useful to have a way for such dumping: make the service expose routes from `net/http/pprof` to make the `/debug/pprof/goroutine` available or make it possible to send `SIGQUIT` to the service and collect what it writes to its stderr or make it possible to crash the service with dumping the core (see above). — kostix, Mar 05 '22 at 11:50

score 0 · Answer 1 · answered Mar 05 '22 at 21:12

Thanks to kostix for pointing me to the GOTRACEBACK runtime environment variable. Setting this to all instead of leaving it at the default of single restores the behaviour described in TGPL. Note that this variable is significant to the runtime, but you can't manipulate it with go env.

The default to only list the panicking goroutine is a change in go 1.6 - my edition of the book is copyrighted 2016 and gives go 1.5 as the prequisite for its example code, so it must predate the change. It's interesting reading the change discussion that there was concern about hiding useful information (as the recipient of many an incomplete error report, I can sympathise with this), but nobody called out the issue of scaling to large production systems that kostix mentioned.

Why doesn't panic show all running goroutines?

1 Answers1