20

The "Go maps in action" entry in the Go blog states:

Maps are not safe for concurrent use: it's not defined what happens when you read and write to them simultaneously. If you need to read from and write to a map from concurrently executing goroutines, the accesses must be mediated by some kind of synchronization mechanism. One common way to protect maps is with sync.RWMutex.

However, one common way to access maps is to iterate over them with the range keyword. It is not clear if for the purposes of concurrent access, execution inside a range loop is a "read", or just the "turnover" phase of that loop. For example, the following code may or may not run afoul of the "no concurrent r/w on maps" rule, depending on the specific semantics / implementation of the range operation:

 var testMap map[int]int
 testMapLock := make(chan bool, 1)
 testMapLock <- true
 testMapSequence := 0

...

 func WriteTestMap(k, v int) {
    <-testMapLock
    testMap[k] = v
    testMapSequence++
    testMapLock<-true
 }

 func IterateMapKeys(iteratorChannel chan int) error {
    <-testMapLock
    defer func() { 
       testMapLock <- true
    }
    mySeq := testMapSequence
    for k, _ := range testMap {
       testMapLock <- true
       iteratorChannel <- k
       <-testMapLock
       if mySeq != testMapSequence {
           close(iteratorChannel)
           return errors.New("concurrent modification")
       }
    }
    return nil
 }

The idea here is that the range "iterator" is open when the second function is waiting for a consumer to take the next value, and the writer is not blocked at that time. However, it is never the case that two reads in a single iterator are on either side of a write - this is a "fail fast" iterator, the borrow a Java term.

Is there anything anywhere in the language specification or other documents that indicates if this is a legitimate thing to do, however? I could see it going either way, and the above quoted document is not clear on exactly what consititutes a "read". The documentation seems totally quiet on the concurrency aspects of the for/range statement.

(Please note this question is about the currency of for/range, but not a duplicate of: Golang concurrent map access with range - the use case is completely different and I am asking about the precise locking requirement wrt the 'range' keyword here!)

Community
  • 1
  • 1
BadZen
  • 4,083
  • 2
  • 25
  • 48
  • According to this maps can have multiple concurrent readers - no writes. (https://groups.google.com/forum/#!msg/golang-nuts/HpLWnGTp-n8/hyUYmnWJqiQJ) – Kaveh Shahbazian Nov 06 '16 at 07:55
  • Of course, the very first paragraph I posted says as much. That does not answer my question. I need to know if the entire body section of the `for-range` statement is considered a "read", or just the start/ends of the iterations of that loop. Please do look at my code above. – BadZen Nov 06 '16 at 13:33
  • I did look. And I wasn't sure. That's why I just put a comment for then going & investigating - and I've found what's described in second answer. – Kaveh Shahbazian Nov 07 '16 at 06:35

2 Answers2

20

You are using a for statement with a range expression. Quoting from Spec: For statements:

The range expression is evaluated once before beginning the loop, with one exception: if the range expression is an array or a pointer to an array and at most one iteration variable is present, only the range expression's length is evaluated; if that length is constant, by definition the range expression itself will not be evaluated.

We're ranging over a map, so it's not an exception: the range expression is evaluated only once before beginning the loop. The range expression is simply a map variable testMap:

for k, _ := range testMap {}

The map value does not include the key-value pairs, it only points to a data structure that does. Why is this important? Because the map value is only evaluated once, and if later pairs are added to the map, the map value –evaluated once before the loop– will be a map that still points to a data structure that includes those new pairs. This is in contrast to ranging over a slice (which would be evaluated once too), which is also only a header pointing to a backing array holding the elements; but if elements are added to the slice during the iteration, even if that does not result in allocating and copying over to a new backing array, they will not be included in the iteration (because the slice header also contains the length - already evaluated). Appending elements to a slice may result in a new slice value, but adding pairs to a map will not result in a new map value.

Now on to iteration:

for k, v := range testMap {
    t1 := time.Now()
    someFunction()
    t2 := time.Now()
}

Before we enter into the block, before the t1 := time.Now() line k and v variables are holding the values of the iteration, they are already read out from the map (else they couldn't hold the values). Question: do you think the map is read by the for ... range statement between t1 and t2? Under what circumstances could that happen? We have here a single goroutine that is executing someFunc(). To be able to access the map by the for statement, that would either require another goroutine, or it would require to suspend someFunc(). Obviously neither of those happen. (The for ... range construct is not a multi-goroutine monster.) No matter how many iterations there are, while someFunc() is executed, the map is not accessed by the for statement.

So to answer one of your questions: the map is not accessed inside the for block when executing an iteration, but it is accessed when the k and v values are set (assigned) for the next iteration. This implies that the following iteration over the map is safe for concurrent access:

var (
    testMap         = make(map[int]int)
    testMapLock     = &sync.RWMutex{}
)

func IterateMapKeys(iteratorChannel chan int) error {
    testMapLock.RLock()
    defer testMapLock.RUnlock()
    for k, v := range testMap {
        testMapLock.RUnlock()
        someFunc()
        testMapLock.RLock()
        if someCond {
            return someErr
        }
    }
    return nil
}

Note that unlocking in IterateMapKeys() should (must) happen as a deferred statement, as in your original code you may return "early" with an error, in which case you didn't unlock, which means the map remained locked! (Here modeled by if someCond {...}).

Also note that this type of locking only ensures locking in case of concurrent access. It does not prevent a concurrent goroutine to modify (e.g. add a new pair) the map. The modification (if properly guarded with write lock) will be safe, and the loop may continue, but there is no guarantee that the for loop will iterate over the new pair:

If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next.

The write-lock-guarded modification may look like this:

func WriteTestMap(k, v int) {
    testMapLock.Lock()
    defer testMapLock.Unlock()
    testMap[k] = v
}

Now if you release the read lock in the block of the for, a concurrent goroutine is free to grab the write lock and make modifications to the map. In your code:

testMapLock <- true
iteratorChannel <- k
<-testMapLock

When sending k on the iteratorChannel, a concurrent goroutine may modify the map. This is not just an "unlucky" scenario, sending a value on a channel is often a "blocking" operation, if the channel's buffer is full, another goroutine must be ready to receive in order for the send operation to proceed. Sending a value on a channel is a good scheduling point for the runtime to run other goroutines even on the same OS thread, not to mention if there are multiple OS threads, of which one may already be "waiting" for the write lock in order to carry out a map modification.

To sum the last part: you releasing the read lock inside the for block is like yelling to others: "Come, modify the map now if you dare!" Consequently in your code encountering that mySeq != testMapSequence is very likely. See this runnable example to demonstrate it (it's a variation of your example):

package main

import (
    "fmt"
    "math/rand"
    "sync"
)

var (
    testMap         = make(map[int]int)
    testMapLock     = &sync.RWMutex{}
    testMapSequence int
)

func main() {
    go func() {
        for {
            k := rand.Intn(10000)
            WriteTestMap(k, 1)
        }
    }()

    ic := make(chan int)
    go func() {
        for _ = range ic {
        }
    }()

    for {
        if err := IterateMapKeys(ic); err != nil {
            fmt.Println(err)
        }
    }
}

func WriteTestMap(k, v int) {
    testMapLock.Lock()
    defer testMapLock.Unlock()
    testMap[k] = v
    testMapSequence++
}

func IterateMapKeys(iteratorChannel chan int) error {
    testMapLock.RLock()
    defer testMapLock.RUnlock()
    mySeq := testMapSequence
    for k, _ := range testMap {
        testMapLock.RUnlock()
        iteratorChannel <- k
        testMapLock.RLock()
        if mySeq != testMapSequence {
            //close(iteratorChannel)
            return fmt.Errorf("concurrent modification %d", testMapSequence)
        }
    }
    return nil
}

Example output:

concurrent modification 24
concurrent modification 41
concurrent modification 463
concurrent modification 477
concurrent modification 482
concurrent modification 496
concurrent modification 508
concurrent modification 521
concurrent modification 525
concurrent modification 535
concurrent modification 541
concurrent modification 555
concurrent modification 561
concurrent modification 565
concurrent modification 570
concurrent modification 577
concurrent modification 591
concurrent modification 593

We're encountering concurrent modification quite often!

Do you want to avoid this kind of concurrent modification? The solution is quite simple: don't release the read lock inside the for. Also run your app with the -race option to detect race conditions: go run -race testmap.go

Final thoughts

The language spec clearly allows you to modify the map in the same goroutine while ranging over it, this is what the previous quote relates to ("If map entries that have not yet been reached are removed during iteration.... If map entries are created during iteration..."). Modifying the map in the same goroutine is allowed and is safe, but how it is handled by the iterator logic is not defined.

If the map is modified in another goroutine, if you use proper synchronization, The Go Memory Model guarantees that the goroutine with the for ... range will observe all modifications, and the iterator logic will see it just as if "its own" goroutine would have modified it – which is allowed as stated before.

icza
  • 389,944
  • 63
  • 907
  • 827
  • Thanks, this answers my question I think. To clear two things up - 1) I am not expecting the"concurrent modification" to never happen in my code - of course it can /will! I am expecting the error to be handled without a concurrent read/write fatal error or possible race condition (ie. well-defined "fail fast" semantics) "Come, modify the map now if you dare!" is exactly right - this is probably not idiomatic in Go but is very much the way Java maps work. The iterator must detect another thread taking up the dare and refuse to emit any more iterated values, because the iterator is "broken". – BadZen Nov 07 '16 at 02:30
  • Second, I definitely "unlock" before returning the error. The way I use the buffered bool channel there, the read "locks" and the write "unlocks". When the `iteratorChannel <- k` is blocked, the critical section is unlocked, since `testMapLock <- true` already unlocked the critical - other goroutines may then modify the map and "break" all of the "open" iterators. This is exactly the behavior I want. – BadZen Nov 07 '16 at 02:33
  • 1
    To be clear: "do you think the map is read by the fo/range statement between t1 and t2" - that's the crux of this. I want to make sure the answer is "no". One can see many ways that what you are calling the "pointer" to the map (a C-implemented `hash_iter*`, really) carries state that may be corrupted / produce undefined behavior, if you look at `hashmap.c`. Add to that the weirdness of /the runtime trying to detect concurrent r/w/ - of course all kinds of complexities here. I want to know I'll always see my "concurrent modification" and never a segfault or that "fatal error". – BadZen Nov 07 '16 at 02:51
  • (the race cond analyzer on my code has always run cleanly - or of course I wouldn't have even posted this question! but just because it runs cleanly doesn't mean there is not an undetected possible race condition. I need the specification (or a v. painful code audit of Go source) to assure me there isn't. The race detector is best-effort, and not exact.) – BadZen Nov 07 '16 at 02:53
  • 1
    @BadZen _"Second, I definitely "unlock" before returning the error."_ I don't think you do. As you wrote, _the read "locks" and the write "unlocks"_. Inside your `for`, you first write (unlock), then you do something (deliver `k`), then you read (lock). After the lock you test something with `if`, and if condition is `true`, you return without unlocking. This seems to me in case of error your last action is "lock", so it remains locked. – icza Nov 07 '16 at 08:49
  • Oh, I see what you are saying. That's correct. Yeah. I'll fix in an edit above. – BadZen Nov 07 '16 at 12:41
  • @BadZen Added a "Final thoughts" section. Also note that if no error is detected, your `IterateMapKeys()` function again will return having the map locked, so as suggested in my answer, you should really use a `defer` unlocking at the beginning of your function after you successfully locked it, and then it won't matter if you return "early" due to an error, or if the `for` loop completes successfully (or even if your code panics – defer will run and unlock the map before it returns). – icza Nov 07 '16 at 12:52
  • Yeah. Global var channels are not a great way to lock this, I was just trying to come up with a "simple" example - there are higher-level concurrency interfaces in my actual code. The problem with the defer pattern is that it will also run if the code exits while unlocked by iterator (say caller closes the iteratorChannel) and that might erroneously unlock - an execution path that did not enter the critical section would exit it. (FYI, the problem for me with RWMutex is you can't timed-wait on it - or get condition variable / notify functionality.) – BadZen Nov 07 '16 at 13:07
  • @BadZen You're right. Then in your case it might be better not to use `defer` (but then you have to unlock after the `for`); or you may still use `defer` and perform a non-blocking send in it (using a `select` + `default` branch), so if somehow the code panicked when the map was unlocked, the non-blocking send will be a no-op. – icza Nov 07 '16 at 13:24
  • I did a `select` at first above, actually - then realized that it didn't fix the problem - a second iterator might have already locked at the point the defer enters. Some additional state is necessary (I have re-entrant mutexes with "keys" in the actual thing - since there is nothing analogous to a 'thread id'... key only unlocks / decrs lock counter if mutex is currently locked by , etc...) – BadZen Nov 07 '16 at 13:31
  • @BadZen If you have / need re-entrant locks, that warrants for a refactoring. See related question: [Recursive locking in Go](http://stackoverflow.com/questions/14670979/recursive-locking-in-go) – icza Nov 07 '16 at 13:34
  • I've heard this "recursive mutexes are always un-necessary" argument and frankly think it's bizarre in the extreme. While there are in fact well documented anti-patterns and pitfalls (often these result in possible deadlock), they definitely have a place and proper use. (Turn the argument in that Q on its head: 'if they weren't necessary, they wouldn't have been added to every other major concurrent language...') Like a lot of other " is always bad" rules in programming, it should be softened to " needs to be carefully considered". This is a discussion for another time, though... – BadZen Nov 07 '16 at 13:39
  • To make a finer point, the argument's summary there is "Recursive mutexes do not protect invariants. Mutexes have only one job, and recursive mutexes don't do it.". In reality, it's "recursive mutexes do not protect invariants with improper use", and "Mutexes have only one job" is completely bogus. (Or circular/semantic reasoning, at best...) – BadZen Nov 07 '16 at 13:42
3

The unit of concurrent access for a for range loop over a map is the map. Go maps in action.

A map is a dynamic data structure that changes for inserts, updates and deletes. Inside the Map Implementation. For example,

The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped. The choice may vary for each entry created and from one iteration to the next. If the map is nil, the number of iterations is 0. For statements, The Go Programming Language Specification

Reading a map with a for range loop with interleaved inserts, updates and deletes is unlikely to be useful.

Lock the map:

package main

import (
    "sync"
)

var racer map[int]int

var race sync.RWMutex

func Reader() {
    race.RLock() // Lock map
    for k, v := range racer {
        _, _ = k, v
    }
    race.RUnlock()
}

func Write() {
    for i := 0; i < 1e6; i++ {
        race.Lock()
        racer[i/2] = i
        race.Unlock()
    }
}

func main() {
    racer = make(map[int]int)
    Write()
    go Write()
    Reader()
}

Don't lock after the read -- fatal error: concurrent map iteration and map write:

package main

import (
    "sync"
)

var racer map[int]int

var race sync.RWMutex

func Reader() {
    for k, v := range racer {
        race.RLock() // Lock after read
        _, _ = k, v
        race.RUnlock()
    }
}

func Write() {
    for i := 0; i < 1e6; i++ {
        race.Lock()
        racer[i/2] = i
        race.Unlock()
    }
}

func main() {
    racer = make(map[int]int)
    Write()
    go Write()
    Reader()
}

Use the Go Data Race Detector. Read Introducing the Go Race Detector.

peterSO
  • 158,998
  • 31
  • 281
  • 276
  • Reading from my buffered bool channel "locks" the critical. Writing to it "unlocks" - I am not locking after the read. – BadZen Nov 06 '16 at 16:01
  • The question is if iterations - specifically what parts of the iteration loop - count as "reads" or not wrt map concurrency. Now, from what you pasted: " If map entries that have not yet been reached are removed during iteration, the corresponding iteration values will not be produced. If map entries are created during iteration, that entry may be produced during the iteration or may be skipped" - it sounds like /none/ of the iteration process is a read, and it is explicitly threadsafe (in so far as operations are atomic and have well-defined semantics). – BadZen Nov 06 '16 at 16:03
  • But, your claim that the "fatal error" is produced contradicts this. So, my question still stands - when or at what point in a map iteration is the map "read"? (If your first code example - and mine - are valid, the answer is "only the start/end of the loop cycle implies a map read". I think this is probably the case! But I do not see this stated explicitly, anywhere. I want to be certain.) – BadZen Nov 06 '16 at 16:04
  • @BadZen: Your code doesn't compile and doesn't run: [How to create a Minimal, Complete, and Verifiable example.](http://stackoverflow.com/help/mcve). – peterSO Nov 06 '16 at 17:48
  • @BadZen: Iteration values are produced for each iteration. – peterSO Nov 06 '16 at 17:56
  • 1
    There isn't an MCVE because there isn't an unexpected behavior I'm saying. I'm asking a conceptual question about Go map concurrency. Just for you though: https://play.golang.org/p/ZkWWvZZ2sv – BadZen Nov 06 '16 at 20:28
  • "Iteration values are produced for each iteration" does not answer the question of if the entire lifetime of a for/range is considered a read or just the start/end of the loop body each cycle. Again, that is what I am asking. – BadZen Nov 06 '16 at 20:29