13

Below is a struct of type Stuff. It has three ints. A Number, its Double and its Power. Let's pretend that calculating the double and power of a given list of ints is an expensive computation.

type Stuff struct {
    Number int
    Double int
    Power  int
}

func main() {
    nums := []int{2, 3, 4} // given numbers
    stuff := []Stuff{}     // struct of stuff with transformed ints

    double := make(chan int)
    power := make(chan int)

    for _, i := range nums {
        go doubleNumber(i, double)
        go powerNumber(i, power)
    }

    // How do I get the values back in the right order?

    fmt.Println(stuff)
}

func doubleNumber(i int, c chan int) {
    c <- i + i
}

func powerNumber(i int, c chan int) {
    c <- i * i
}

The result of fmt.Println(stuff) should be the same as if stuff was initialized like:

stuff := []Stuff{
    {Number: 2, Double: 4, Power: 4}
    {Number: 3, Double: 6, Power: 9}
    {Number: 4, Double: 8, Power: 16}
}

I know I can use <- double and <- power to collect values from the channels, but I don't know what double / powers belong to what numbers.

icza
  • 389,944
  • 63
  • 907
  • 827
Jorge Bucaran
  • 5,588
  • 2
  • 30
  • 48

2 Answers2

15

Goroutines run concurrently, independently, so without explicit synchronization you can't predict execution and completion order. So as it is, you can't pair returned numbers with the input numbers.

You can either return more data (e.g. the input number and the output, wrapped in a struct for example), or pass pointers to the worker functions (launched as new goroutines), e.g. *Stuff and have the goroutines fill the calculated data in the Stuff itself.

Returning more data

I will use a channel type chan Pair where Pair is:

type Pair struct{ Number, Result int }

So calculation will look like this:

func doubleNumber(i int, c chan Pair) { c <- Pair{i, i + i} }

func powerNumber(i int, c chan Pair) { c <- Pair{i, i * i} }

And I will use a map[int]*Stuff because collectable data comes from multiple channels (double and power), and I want to find the appropriate Stuff easily and fast (pointer is required so I can also modify it "in the map").

So the main function:

nums := []int{2, 3, 4} // given numbers
stuffs := map[int]*Stuff{}

double := make(chan Pair)
power := make(chan Pair)

for _, i := range nums {
    go doubleNumber(i, double)
    go powerNumber(i, power)
}

// How do I get the values back in the right order?
for i := 0; i < len(nums)*2; i++ {
    getStuff := func(number int) *Stuff {
        s := stuffs[number]
        if s == nil {
            s = &Stuff{Number: number}
            stuffs[number] = s
        }
        return s
    }

    select {
    case p := <-double:
        getStuff(p.Number).Double = p.Result
    case p := <-power:
        getStuff(p.Number).Power = p.Result
    }
}

for _, v := range nums {
    fmt.Printf("%+v\n", stuffs[v])
}

Output (try it on the Go Playground):

&{Number:2 Double:4 Power:4}
&{Number:3 Double:6 Power:9}
&{Number:4 Double:8 Power:16}

Using pointers

Since now we're passing *Stuff values, we can "pre-fill" the input number in the Stuff itself.

But care must be taken, you can only read/write values with proper synchronization. Easiest is to wait for all "worker" goroutines to finish their jobs.

var wg = &sync.WaitGroup{}

func main() {
    nums := []int{2, 3, 4} // given numbers

    stuffs := make([]Stuff, len(nums))
    for i, n := range nums {
        stuffs[i].Number = n
        wg.Add(2)
        go doubleNumber(&stuffs[i])
        go powerNumber(&stuffs[i])
    }
    wg.Wait()
    fmt.Printf("%+v", stuffs)
}

func doubleNumber(s *Stuff) {
    defer wg.Done()
    s.Double = s.Number + s.Number
}

func powerNumber(s *Stuff) {
    defer wg.Done()
    s.Power = s.Number * s.Number
}

Output (try it on the Go Playground):

[{Number:2 Double:4 Power:4} {Number:3 Double:6 Power:9} {Number:4 Double:8 Power:16}]

Writing different slice elements concurrently

Also note that since you can write different array or slice elements concurrently (for details see Can I concurrently write different slice elements), you can write the results directly in a slice without channels. See Refactor code to use a single channel in an idiomatic way how this can be done.

icza
  • 389,944
  • 63
  • 907
  • 827
  • 1
    This seems like a great answer! +1. Does this solution (the one using channels) scale well for N number of subprocesses? – Jorge Bucaran Jun 16 '16 at 13:53
  • 2
    @JorgeBucaran Yes, but use buffered channels. And for large number of processes / jobs, I would use a _producer-consumer_ pattern where the number of worker goroutines is fixed and well quarantined. For an example see my answer at [Bruteforce MD5 Password cracker](http://codereview.stackexchange.com/questions/114376/bruteforce-md5-password-cracker/116560#116560). – icza Jun 16 '16 at 13:59
  • The problem I have with this solution is that it requires creating a `Pair` structure. I don't have enough experience to say properly, but my "gut feeling" suggests @Vatine's idea is the way to go here. – Jorge Bucaran Jun 16 '16 at 14:05
  • @JorgeBucaran The reason why I chose `Pair` instead of `Stuff` is to make it obvious that a received value is not a complete `Stuff`. It might be misleading that a received `Stuff` is just a part of a real `Stuff` as Double and Power are computed by different goroutines and are received from different channels. So you have to put back a "real" stuff from multiple incomplete stuffs. – icza Jun 16 '16 at 14:10
  • What if there is also Triple, Log, Cube, etc. Each transformation can be derived from the same number, which is known before any goroutine is created, and should execute independently from the others. – Jorge Bucaran Jun 16 '16 at 14:13
  • @icza That's actually exactly what I propose. You spawn goroutines to compute a complete Stuff. If you then need to spawn more goroutines to compute the component parts, you do that. Otherwise you don't. Classic scatter-gather pattern. – Vatine Jun 16 '16 at 14:15
  • @Vatine Yes, I myself wouldn't break computing double and power into 2 goroutines, but do both in the same. The question did that, so I assumed that's what the asker wanted. – icza Jun 16 '16 at 14:21
  • @icza "...but do both in the same" Do you mean executing each transformation concurrently inside a goroutine or sequentially inside a goroutine? – Jorge Bucaran Jun 16 '16 at 14:24
  • @JorgeBucaran If I would have to choose, I would calculate everything that is required for one `Stuff` in one place, not like you doing `Double` in one and `Power` in another. – icza Jun 16 '16 at 14:32
  • Even if each process is isolated and computationally expensive? or slow like an http.Get. – Jorge Bucaran Jun 16 '16 at 14:34
  • @JorgeBucaran If you have only 3 numbers and calculating `Double` and `Power` are both very expensive, then yes, this may be better. But if you have many numbers, then you would see no difference (or it may be even better) to calculate double and power in the same place (in the same goroutine). – icza Jun 16 '16 at 14:37
1

Personally, I would use a chan Stuff to pass the results back on, then spin up goroutines computing a full Stuff and pass it back. If you need the various part of a single Stuff computed concurrently, you can spawn goroutines from each goroutine, using dedicated channels. Once you've collected all the results, you can then (optionally) sort the slice with the accumulated values.

Example of what I mean below (you could, in principle, use a sync.WaitGroup to coordinate things, but if the input count is known, you don't strictly speaking need it).

type Stuff struct {
  number int64
  double int64
  square int64
}

// Compute a Stuff with individual computations in-line, send it out
func computeStuff(n int64, out chan<- Stuff) {
   rv := Stuff{number: n}
   rv.double = n * 2
   rv.square = n * n
   out <- rv
}

// Compute a Stuff with individual computations concurrent
func computeStuffConcurrent(n int64, out chan<- Stuff) {
  rv := Stuff{number: n}
  dc := make(chan int64)
  sc := make(chan int64)
  defer close(dc)
  defer close(sc)
  go double(n, dc)
  go square(n, sc)
  rv.double = <-dc
  rv.square = <-sc
  out <- rv
}

func double(n int64, result chan<- int) {
   result <- n * 2
}

func square(n int64, result chan<- int) {
  result <- n * n
}

func main() {
  inputs := []int64{1, 2, 3}
  results := []Stuff{}
  resultChannel := make(chan Stuff)

  for _, input := range inputs {
    go computeStuff(input, resultChannel) 
    // Or the concurrent version, if the extra performance is needed
  }

  for c := 0; c < len(inputs); c++ {
    results = append(results, <- resultChannel)
  }
  // We now have all results, sort them if you need them sorted
}
Vatine
  • 20,782
  • 4
  • 54
  • 70