Global variable concurrent use for write-once read-many usecase

Question

I want to use a global variable. During first access to read it, I want to write to this variable using an API. For any subsequent access to read it, it should not require locks.

This is the implementation I have written. But this seems like an overkill for such a small task.

Is there a conventional way it is done in go?
If I have multiple such global variables and I don't want to put them under this same struct, is there a way to do this without code duplication?
Read this answer. One way for making single variable's use to be atomic is by using "sync/atomic" library. But, the functions are only there for integer types. How does community work with "string" type?

Please free to suggest any other unrelated changes.

PS: Using sync.Once didn't seem right. If the first time to fetch fails, the program will never get 'clusterName'. Also, couldn't think of a way to make other readers wait until sync.Once is complete. Any ideas?

package main

import (
    "fmt"
    "sync"
    "time"
)

type ClusterName struct {
    sync.RWMutex
    clusterName string
}

// Global variable. Read-many, write once.
var clusterName ClusterName

// Method to avoid locking during reads if 'clusterName' has been filled once.
func GetClusterName() (string, error) {
    // Take read lock to see if 'clusterName' is filled.
    clusterName.RLock()
    if clusterName.clusterName == "" {
        // 'clusterName' is not filled. Release read-lock and call method to fill it with write-lock.
        clusterName.RUnlock()
        return getClusterName()
    }

    defer clusterName.RUnlock()
    return clusterName.clusterName, nil
}

// Method to fetch and fill cluster name. Takes a write-lock.
func getClusterName() (string, error) {
    // Take write-lock.
    clusterName.Lock()
    defer clusterName.Unlock()

    // See if previous writer has already filled this. Just return if already filled.
    if clusterName.clusterName != "" {
        return clusterName.clusterName, nil
    }

    // Only 1 writer will ever reach here.
    var err error
    clusterName.clusterName, err = fetchClusterName()
    if err != nil {
        return "", err
    }

    return clusterName.clusterName, nil
}

func fetchClusterName() (string, error) {
    // API call.
    time.Sleep(time.Second)
    return "test-cluster-name", nil
}

func main() {
    for i := 0; i < 50; i++ {
        fmt.Println(GetClusterName())
    }
}

Playground: https://go.dev/play/p/PV42PMXliRC

You might be able to pull something off with `sync/atomic`'s `StorePointer` and `LoadPointer` functions. I personally would not say this way is more conventional than what you've already implemented. Also, it uses the `unsafe.Pointer` type which's name already implies potential risks that may come with this solution. The write-once operation you're looking for may be easier to solve with the `sync.Once` function. Let me know if you want me to provide you with a full example, in case this solves your problem. — nikoksr, Dec 29 '22 at 11:35
A potential solution: https://pkg.go.dev/sync/atomic#example-Value-ReadMostly — subtleseeker, Dec 30 '22 at 02:39

gregory112 · Answer 1 · 2022-12-30T10:26:03.753

Reading your code, it seems that sync.RWMutex is unsuitable here. From how it works, it seems that for any kind of read operation, there is a chance that it results in write operation too.

Use a sync.Mutex instead of RWMutex

Mutex fixes the flaw in the code, in which RLock can just be replaced with Lock. This is because in your code, you initially read the variable with RLock and when you realize that it is empty, release the read lock and do write lock instead. This very little moment during unlocking and relocking with write lock is unprotected by any mutex, so why not just do a full lock and unlock for every read? It also makes the code simpler. As you said, fetch can sometimes fail. This means that the first read operation does not always result in write to clusterName. Therefore, we have to assume that every read operation can result in write, which means that read locking only cannot be used here.

Here is a re-implementation for your code

func GetClusterName() (string, error) {
    // Take read lock to see if 'clusterName' is filled.
    clusterName.Lock()
    defer clusterName.Unlock()
    if clusterName.clusterName == "" {
        return getClusterName()
    }

    return clusterName.clusterName, nil
}

func getClusterName() (string, error) {
    // See if previous writer has already filled this. Just return if already filled.
    if clusterName.clusterName != "" {
        return clusterName.clusterName, nil
    }

    // Only 1 writer will ever reach here.
    var err error
    clusterName.clusterName, err = fetchClusterName()
    if err != nil {
        return "", err
    }

    return clusterName.clusterName, nil
}

So,

You should use sync.Mutex. As you said, sync.Once should not be used here as it only ensures that an action is done once, regardless that it leads to successful operation or not. This is also the most conventional and simplest way
This also depends on what you are trying to achieve and how much codes you want to share. I suggest creating multiple structs and an interface which contains methods to get, fetch, and set a value, and also lock and unlock. All structs should have mutex in it and a variable that you want to set. You can then create a function that does locking, get, fetch, set, and unlock.
String is, well, different than integer. String is an array of character (integer), it is stored differently than int. Instead of using atomic, we usually just use a separate mutex to operate with string.

Here is my example. I assume that each of your variable has different ways of fetching so Fetch needs to also be abstract.

type LockSetter interface {
    Lock()
    Unlock()
    Set(value string)
    Get() string
    Fetch() (string, error)
}

type LockSetVariable struct {
    lock       *sync.Mutex
    myVariable string
}

func (l *LockSetVariable) Lock() {
    l.lock.Lock()
}

func (l *LockSetVariable) Unlock() {
    l.lock.Unlock()
}

func (l *LockSetVariable) Set(value string) {
    l.myVariable = value
}

func (l *LockSetVariable) Get() string {
    return l.myVariable
}

func (l *LockSetVariable) Fetch() (string, error) {
    now := time.Now()
    l.myVariable = now.Format("2006")
    return "", nil
}

func GetValue(param LockSetter) (string, error) {
    param.Lock()
    defer param.Unlock()
    
    v := param.Get()
    if v == "" {
        newV, err := param.Fetch()
        if err != nil {
            return "", err
        }
        
        param.Set(newV)
    }
    
    return v, nil
}

I didn't get the race in my solution. Can you please explain the one you found? The writer will keep waiting until any reader is reading the variable. It is guaranteed that when a writer is working, no reader is reading. About using normal mutex, I agree that every read can result in a write if API keeps failing. But any optimizations are built on average case scenarios. Locks are very costly and are redundant for read-many, write-once usecase. But I also agree that simplicity might be more important. — subtleseeker, Dec 30 '22 at 02:29
@subtleseeker the race condition in your code would not seem to cause undetermined value of clusterName, as during fetch you would do `== ""` comparison again. So you are right maybe it is inappropriate to call it race (I have edited the answer). However it is quite easy to see that there is a section in your code that is unprotected by the mutex, this is when RUnlock is called and `return getClusterName()` until Lock is called. As for expensive operation, remember that RWMutex is also more expensive than Mutex, so there are tradeoffs, especially knowing that Mutex provides shorter codes. — gregory112, Dec 30 '22 at 10:22
@subtleseeker If you are concerned with heavy read and write once performance, I suggest making read and write consistent so read will never result in write action. This means read can be best effort, e.g., if the variable is empty, return an error instead of trying to fetch. The fetch and write can be done in separate goroutine. For example, depending on the use case, you can poll fetch every few seconds for example, and stop until it is set, or you can call fetch when your program starts and keep retrying until the variable is set. This way all read will block until it has value. — gregory112, Dec 30 '22 at 10:25

score 0 · Answer 2 · answered Dec 30 '22 at 03:32

You need a modified sync.Once implementation, if you look at sync.Once, it does this:

if (atomic.LoadUint32(&done) == 0) {
    m.Lock()
    if (done == 0) {
        f()
        atomic.StoreUint32(&done, 1)
    }
    m.Unlock()
}

Essentially, it checks if the operation has been executed before the check, then enter a lock to wait for/execute the operation. Furthermore, to work with data types other than the basic integer ones, wrap them in pointers. As a result, you can do:

var m sync.Mutex
var clusterName atomic.Pointer[string]

// Method to avoid locking during reads if 'clusterName' has been filled once.
func GetClusterName() (string, error) {
    // See if 'clusterName' is filled.
    cptr := clusterName.Load()
    if cptr == nil {
        return getClusterName()            
    }

    return *cptr, nil
}

// Method to fetch and fill cluster name. Takes a lock.
func getClusterName() (string, error) {
    // Take lock.
    m.Lock()
    defer m.Unlock()

    // See if previous writer has already filled this. Just return if already filled.
    cptr := clusterName.Load()
    if cptr != nil {
        return *cptr, nil
    }

    // Only 1 successful writer will ever reach here.
    c, err := fetchClusterName()
    if err != nil {
        // Fail, just abort, do not touch clusterName here
        return "", err
    }

    // Only release when succeed
    clusterName.Store(&c)
    return c, nil
}

func fetchClusterName() (string, error) {
    // API call.
    time.Sleep(time.Second)
    return "test-cluster-name", nil
}

The function GetClusterName can only return at 4 points. If it returns after checking the variable outside the lock, then there must be another goroutine having written to clusterName before, which guarantees a release-acquire semantics between these 2 threads. If it returns after checking the variable inside the lock, the same thing happens as before. If it returns after fetchClusterName, then the operation will encounter an error, and the result is an empty string, or the operation succeeded, and the goroutine is the sole writer of the clusterName variable.

Global variable concurrent use for write-once read-many usecase

2 Answers2