Moving data over a channel:
c := make(chan [1000]int)
// spawn some goroutines that read from this channel
var data [1000]int
// populate the data
// write data to the channel
c <- data
The potential problem here, as you mentioned, is that you're moving a lot of data, so you might be doing an excessive amount of memory copying.
You could prevent that by sending a reference type, such as a pointer or slice over the channel:
c := make(chan []int)
// spawn some goroutines that read from this channel
var data [1000]int
// populate the data
// write a reference to data to the channel
c <- data[:]
So we just did the exact same data transfer, but reduced the memory copying, right? Well, here's a potential problem: You sent over the channel a reference to data
, but that data
value continues to be accessible in the current scope, even after the send:
// write a reference to data to the channel
c <- data[:]
// start messing with data
data[0] = 999
data[1] = 1234
...
This code might have just introduced a potential data race, because whoever read that slice from the channel might be working on it at the same time as you start modifying it.
The idea of passing ownership is that after you give out a reference to something, you are also conceding ownership of that thing, and will no use it. So long as we don't use data
after giving out the reference (sending the slice on the channel), then we have properly passed ownership.
This problem is an extension of the general problem of shared state. Unlike, Rust, for example, Go doesn't have language constructs to properly control shared state. In order to reduce the chances of these errors, you could apply some strategies:
- Avoid passing references on channels: In the above example, the problem occurred once we started passing the data by reference, with a slice. This was only done to reduce the amount of memory coping done. Unless there was a pragmatic reason to do this optimization (a worthwhile performance difference was measured), it could be avoided entirely. Still, though, there are some data types in Go that are inherently a reference (e.g., maps and slices). If these types must be passed on a channel, then other strategies can be used.
- Separate the data creation logic into functions: In the example above, we could refactor the code:
func sendData(c chan []int) {
var data [1000]int
// populate the data
// write a reference to data to the channel
c <- data[:]
}
c := make(chan []int)
// spawn some goroutines that read from this channel
// send some data
sendData(c)
The possibility of incorrectly using data
still exists, but now it's isolated to a small function with a clear intent. In theory, the isolation should make the code easier to understand, more obvious what the correct use of data
is, and fewer changes would have potential interaction with it.
- Don't mix data pipelines with persistent state: By data pipeline, I mean two or more concurrent routines, between which data flows via channels. Expanding on the previous point, make the creation of owned references as close as possible to where they enter the data pipeline. Make space between where a goroutine receives data and where it sends it again or uses it, as tight as possible. In the general rules of ownership, you can only transfer ownership of something when you presently have full ownership of it. Due to this rule, you should avoid as much as possible, sending any reference on a channel that you didn't just create the referenced data immediately before sending. If you have a reference to any persistent or global state, it becomes much harder to ensure that ownership is respected.
By keeping the creation of the reference and the transfer of ownership in an isolated, global function, it should be harder to make errors. Then the only ways to violate the ownership rule are to:
- Leak the reference to global state
- Try to eliminate global variables and global state
- Leak the reference to a reference type parameter's state
- Don't take any reference type parameters in data sending functions
- Modify the reference data after sending the reference
- Put the send operation at the very end of the function. If necessary, you could put the send inside a defered call.
There's no perfect solution to eliminate all shared state issues (even in Rust they sometimes exist in practice), but I hope these strategies will help you think about how to tackle this problem.