4

I'm practicing a challenge to calculate factorials by splitting calculations into 100 groups concurrently, I solved lots of issue on WaitGroups, but still in the calculateFactorial function I got the deadlock on range over channel part. Wish someone could point the issue here, thank you.

package main

import (
    "fmt"
    "sync"
)

func main() {
    var wg sync.WaitGroup
    wg.Add(2)
    in := make (chan int)
    out := make (chan float64)



    out = calculateFactorial(genConcurrentGroup(in, &wg), &wg)

    go func() {
        in <- 10
        close(in)
    }()

    fmt.Println(<-out)

    wg.Wait()


}

//split input number into groups
//the result should be a map of [start number, number in group]
//this is not heavy task so run in one go routine
func genConcurrentGroup(c chan int, wg *sync.WaitGroup) chan map[int]int{
    out := make(chan map[int]int)

    go func() {
        //100 groups
        total:= <- c
        wg.Done()
        //element number in group
        elemNumber := total / 100
        extra := total % 100
        result := make(map[int]int)
        if elemNumber>0{
            //certain 100 groups
            for i:=1 ;i<=99;i++{
                result[(i-1) * elemNumber + 1] = elemNumber
            }
            result[100] = extra + elemNumber
        }else{
            //less than 100
            for i:=1;i<=total;i++{
                result[i] = 1
            }
        }

        out <- result
        close(out)
    }()
    return out
}

//takes in all numbers to calculate multiply result
//this could be heavy so can do it 100 groups together
func calculateFactorial(nums chan map[int]int, wg *sync.WaitGroup) chan float64{
    out := make(chan float64)


    go func() {
        total:= <- nums
        wg.Done()
        fmt.Println(total)

        oneResult := make(chan float64)

        var wg2 sync.WaitGroup
        wg2.Add(len(total))

        for k,v := range total{
            fmt.Printf("%d %d \n",k,v)
            go func(k int, v int) {
                t := 1.0
                for i:=0;i<v;i++{
                    t = t * (float64(k) + float64(i))
                }
                fmt.Println(t)
                oneResult <- t
                wg2.Done()
            }(k,v)
        }

        wg2.Wait()
        close(oneResult)

        result := 1.0
        for n := range oneResult{  //DEADLOCK HERE! Why?
            result *= n
        }


        fmt.Printf("Result: %f\n",result)

        out <- result

    }()
    return out
}

Update:

Thanks to Jessé Catrinck's answer which fixed the issue in the above code by simply change the oneResult to a buffered channel. However in https://stackoverflow.com/a/15144455/921082 there's a quote

You should never add buffering merely to fix a deadlock. If your program deadlocks, it's far easier to fix by starting with zero buffering and think through the dependencies. Then add buffering when you know it won't deadlock.

So could anyone please help me figure out how to not to use buffered channel for this? Is it possible?

Furthermore, I did some research on what exactly causes a deadlock.

Some quote like from https://stackoverflow.com/a/18660709/921082,

If the channel is unbuffered, the sender blocks until the receiver has received the value. If the channel has a buffer, the sender blocks only until the value has been copied to the buffer; if the buffer is full, this means waiting until some receiver has retrieved a value.

Said otherwise :

  1. when a channel is full, the sender waits for another goroutine to make some room by receiving

  2. you can see an unbuffered channel as an always full one : there must be another goroutine to take what the sender sends.

So in my original situation, what is probably causing the deadlock is maybe :

  1. the range over channel is not receiving ?

  2. the range over channel is not receiving on a separated go routine. ?

  3. the oneResult is not properly closed, so range over channel doesn't know where's the end?

for number 3, I don't know if there's anything wrong about closing the oneResult before range over, since this pattern appears on many examples on the internet. If it is number 3, could it be something wrong in the wait group?

I got another article very similar to my situation https://robertbasic.com/blog/buffered-vs-unbuffered-channels-in-golang/, in its second lesson learned, he uses a for { select {} } infinite loop as an alternative to range over, it seems solved his problem.

 go func() {
        for{
            select {
            case p := <-pch:
                findcp(p)
            }
        }
    }()

Lesson number 2 — an unbuffered channel can’t hold on to values (yah, it’s right there in the name “unbuffered”), so whatever is sent to that channel, it must be received by some other code right away. That receiving code must be in a different goroutine because one goroutine can’t do two things at the same time: it can’t send and receive; it must be one or the other.

Thanks

tomriddle_1234
  • 3,145
  • 6
  • 41
  • 71

2 Answers2

3

The deadlock isn't on the range-over-channel loop. If you run the code on playground you'll see at the top of the stacktrace that the error is caused by wg2.Wait (line 88 on playground and pointed to by the stacktrace). Also in the stacktrace you can see all the goroutines that haven't finished because of the deadlock, this is because oneResult<-t never completes, so none of the goroutines started in the loop ever finish.

So the main problem is here:

wg2.Wait()
close(oneResult)

// ...

for n := range oneResult{
// ...

Also looping over a closed channel is not what you want, I assume. However even if you didn't close the channel, that loop would never start because wg2.Wait() will wait until its done.

oneResult <- t
wg2.Done()

But it will never be done because it relies on the loop to be already running. The line oneResult <- t will not complete unless there's someone on the other side receiving from that channel, which is your loop, however that range-over-channel loop is still waiting for wg2.Wait() to complete.

So essentially you have a "circular dependency" between the channel's sender and receiver.

To fix the issue you need to allow the loop to start receiving from the channel while still making sure that channel's closed when done. You can do thing by wrapping the two wait-and-close lines into their own goroutine.

https://play.golang.com/p/rwwCFVszZ6Q

mkopriva
  • 35,176
  • 4
  • 57
  • 71
  • Thanks so much, your answer makes me much clearer. It's a bit complex, but good to know things on this problem. Is there any book covered such "circular dependency" issue caused by wait group? – tomriddle_1234 Feb 26 '19 at 16:52
  • https://blog.golang.org/pipelines, the Fan-out Fan-in part mentioned a closer with WaitGroup, this is the main issue for my problem I think – tomriddle_1234 Feb 26 '19 at 17:08
  • I'm not sure if "circular dependency" is the right term to be honest, and also I don't know of any books/articles that address deadlocks specifically under such circumstances. But it seems to me that there are two main reasons why a deadlock could occur, either you're sending to a channel without receiving from it, or you're receiving from a channel without sending to it. This can happen if you forgot to write the code to receive/send, or you didn't forget writing it but that piece of code is not being executed because it's blocked by something somewhere, like in your case by `wg2.Wait`. – mkopriva Feb 26 '19 at 17:12
  • @JesséCatrinck Please correct me if I'm wrong, but the buffered channel only allows you to fill it up before it's ever received from. It doesn't make the program parallel nor concurrent. And in your example you're still just waiting with `wg2` until you fill up the buffer, keeping the `range oneResult` loop idle unnecessarily. And even if you would move `wg2` into its own goroutine to activate the loop then you would just get the same effect as with the unbuffered channel making the buffered one pointless. – mkopriva Feb 26 '19 at 20:06
  • @JesséCatrinck [This](https://play.golang.com/p/YHeQ3JH8YCx) feels more concurrent than [this](https://play.golang.com/p/92N_JLO8Hcz). Am I wrong? – mkopriva Feb 26 '19 at 20:06
1

You need to add a buffer to the variable oneResult

enter image description here

Explanation: https://www.rapidloop.com/blog/golang-channels-tips-tricks.html

Jessé Catrinck
  • 2,227
  • 19
  • 20
  • could you plz tell me why must buffered channel ? my original intention seems doesn't necessarily have to use buffered channel. like, as long as there is a `oneResult` value, just calculate a `result *=n` in range over channel. – tomriddle_1234 Feb 26 '19 at 14:39
  • You need the buffer to add *multiple* values to the channel `oneResult` then when you loop through it you will get all these values. See here: http://devs.cloudimmunity.com/gotchas-and-common-mistakes-in-go-golang/#anameunbuf_ch_send_doneasendingtoanunbufferedchannelreturnsassoonasthetargetreceiverisready – Jessé Catrinck Feb 26 '19 at 19:14