1

Look at the following code snippet.

package main

import (
    "errors"
    "fmt"
    "math/rand"
    "runtime"
    "sync"
    "time"
)

func random(min, max int) int {
    rand.Seed(time.Now().Unix())
    return rand.Intn(max-min) + min
}

func err1(rand int, chErr chan error, wg *sync.WaitGroup) {
    if rand == 1 {
        chErr <- errors.New("Error 1")
    }

    wg.Done()

}

func err2(rand int, chErr chan error, wg *sync.WaitGroup) {
    if rand == 2 {
        chErr <- errors.New("Error 2")
    }
    wg.Done()
}

func err3(rand int, chErr chan error, wg *sync.WaitGroup) {
    if rand == 3 {
        chErr <- errors.New("Error 3")
    }
    wg.Done()
}

func err4(rand int, chErr chan error, wg *sync.WaitGroup) {
    if rand == 3 {
        chErr <- errors.New("Error 4")
    }
    wg.Done()
}

func err5(rand int, chErr chan error, wg *sync.WaitGroup) {
    if rand == 4 {
        chErr <- errors.New("Error 5")
    }
    wg.Done()
}

func main() {

    runtime.GOMAXPROCS(runtime.NumCPU())

    chErr := make(chan error, 1)
    wg := new(sync.WaitGroup)

    //n := random(1, 8)
    n := 3
    fmt.Println(n)

    wg.Add(5)
    go err1(n, chErr, wg)
    go err2(n, chErr, wg)
    go err3(n, chErr, wg)
    go err4(n, chErr, wg)
    go err5(n, chErr, wg)

    fmt.Println("Wait")
    wg.Wait()
    select {
    case err := <-chErr:
        fmt.Println(err)
        close(chErr)
    default:
        fmt.Println("NO error, job done")
    }
}

How can I avoid deadlock here? I could assign buffer length 2, but maybe it has more elegant way to solve the problem.

I did rand == 3 on functions err3 and err4 with consciously.

softshipper
  • 32,463
  • 51
  • 192
  • 400

3 Answers3

3

Your program is deadlocking because your channels are full.

Your channel size is one. You're then calling wg.Wait() .. which waits for 5 functions to be called. Now, once you get to err3 .. rand == 3 and therefore an error is passed on your channel.

At this point, your channel is full and you've only ticked off 3 of your waitgroup items.

err4 is called with the value 3 .. which also wants to put an error on your channel. At this point, it blocks - because your channel is full and nothing has been popped from it.

So your main goroutine will block because your waitgroup will never be finished.

The fix is indeed to make your channel buffer larger. That way, when the errors are attempting to be placed on the channel - it won't block, and your waitgroup has a chance to have all of its items ticked off.

Simon Whitehead
  • 63,300
  • 9
  • 114
  • 138
  • **This is fundamentally wrong**. Adding buffering may work in certain cases but is a risky strategy. See my answer. – Rick-777 Jan 22 '15 at 08:21
  • @Rick-777 With such a small example that the OP clearly threw together, I was unsure how I might explain the sorts of things you mention in your answer. So, I opted to answer the question as directly asked. I honestly find it difficult to apply larger patterns to such simple examples - feel free to provide one. I will certainly upvote you. – Simon Whitehead Jan 22 '15 at 08:50
3

Generally, do not fall into the trap of thinking that larger buffers fix deadlocks. This approach might work in certain specific cases but just isn't generally true.

Deadlock is best addressed by understanding how goroutines depend on each other. Essentially, you must eliminate loops of communicating where there is a mutual dependency. The non-blocking send idea (see @izca's answer) is one helpful trick, but not the only one.

There is a considerable body of knowledge on how to avoid deadlock/livelock. Much of it is from the days when Occam was popular in the '80s and '90s. There are a few special gems from people such as Jeremy Martin (Design Strategy for Deadlock-Free Concurrent Systems), Peter Welch (Higher Level Paradigms) and others.

  1. The client-server strategy is simple: describe your Go-routine network as a set of communicating servers and their clients; ensure that there are no loops in the network graph => deadlock is eliminated.

  2. I/o-par is a way to form rings and toruses of Go-routines such that there will not be a deadlock within the structure; this is a particular case where loops are allowed but behave in a general deadlock-free way.

So, my strategy is to reduce the buffer sizes first, think about what's happening, fix the deadlocks. Then later, re-introduce buffers to improve performance, based on benchmarks. Deadlocks are caused by loops in the communication graph. Break the loops.

Related answer

Community
  • 1
  • 1
Rick-777
  • 9,714
  • 5
  • 34
  • 50
1

Since you wrote you intentionally used rand == 3 in both err3() and err4(), there can be 2 solutions:

1. Increase the buffer size of the channel

Increase the buffer size of the chErr channel to at least 2 because in your program using n = 3 might result 2 goroutines sending a value on the channel.

2. Use non-blocking send

Use a non-blocking channel send preferably in all of your errX() functions (but at least in err3() and err4() because they send on the same condition) with select:

select {
case chErr <- errors.New("Error 3"):
default:
}

This will try to send an error on the channel but if it is not ready (if it's full because another goroutine has already sent a value), the default case will be selected which does nothing.

Try it out on Go Playground.

Note: this will "lose" one of the errors because the channel can only hold one error, but you read (receive) only one value from it anyway.

You can read more about the non-blocking send in the Go Concurrency Patterns: Timing out, moving on blog article.

icza
  • 389,944
  • 63
  • 907
  • 827