0

In a program I am working on (an ASP.NET application) I have an in-memory lookup collection that is static readonly (shared among all thread of the ASP.NET application).

I know this is not orthodox but that was made to have an in-memory version of a remote and slow DB, to improve performance of some lookup operations.

The collection is then accessed by ASP.NET pages that need to do the lookup, so it's accessed by many threads, but that's not an issue because it's guaranteed that those threads only read the collection.

Sometimes I need to refresh the collection content, to do so in a background thread (with a lock), I work on a temporary variable of the same type of static lookup collection.

When I finished all the work, I simply assign the temp variable to the static lookup collection, if meanwhile the "swapping" is completed someone accessed the old data, that's not an issue (reading old data is accepted as long sooner or later the data will be updated).

I think that there should be not any issued, but am a bit scared about race condition of a such situation. Another aspect that scare me a bit are memory leaks.

I start considering Interlock.Exchange, but not sure if fit my situation.

Here is the base class I use to handle the in-memory collection that represents lookup structure of a slow and remote database:

//In memory indexed collection to query slow table by key when table does not contain too much data and is somehow slow (eh remote system)
public class InMemoryStore<T,K> 
    where T : new()
    where K : struct
{
    protected object _instanceSync = new object();
    protected DateTime _lastRebuild = ConstantsEntry.MIN_DATE;
    protected TimeSpan _timeout = new TimeSpan(0, 15, 0);
    protected int _cutoff = 1000;

    protected ReadOnlyDictionary<K, T> _preparingData = null;
    protected ReadOnlyDictionary<K, T> _readyData = null;
    protected volatile bool _isSyncRunning = false;
    protected DateTime _syncronizationStart = ConstantsEntry.MIN_DATE;

    protected InMemoryStore(TimeSpan timeout, int cutoff)
    {
        _timeout = timeout;
        _cutoff = cutoff;
    }

    // Best effort query against the store using only key as search means
    public List<T> Query ( List<K> query)
    {
        List<T> result = new List<T>();
        var dictionaryRef = _readyData;

        if (dictionaryRef == null)
            return result;

        foreach (var key in query)
            if (dictionaryRef.TryGetValue(key, out T value))
                result.Add(value);

        return result;
    }

    // Standard logic to rebuild internal index, provide some extension point to customize memory constraints and table access
    public void Rebuild()
    {
        try
        {
            lock (_instanceSync)
            {
                if (_isSyncRunning && (DateTime.UtcNow - _syncronizationStart) < new TimeSpan(0,30,0) )
                    return;

                if (this.MemoryLimitExceeded(cutoff: _cutoff))
                {
                    _readyData = null;
                    _preparingData = null;
                }

                _preparingData = null;
                _isSyncRunning = true;
                _syncronizationStart = DateTime.UtcNow;

                Task.Run(() =>
                {
                    try
                    {
                        this.InternalRebuildLogic();

                        if (_preparingData != null)
                        {
                            _readyData = _preparingData
                            //or better --> Interlocked.Exchange<ReadOnlyDictionary<K, T>>(ref _readyData, _preparingData);

                            _preparingData = null;
                            _lastRebuild = DateTime.UtcNow;
                        }
                    }
                    catch (Exception threadErr) { }
                    finally
                    {
                        _isSyncRunning = false;
                    }
                });
            }
        }
        catch (Exception err) { }
    }

    
    //Extension point to execute custom index rebuild
    protected virtual void InternalRebuildLogic() {}

    // Check there is not too much item in the collection 
   
    protected virtual bool MemoryLimitExceeded (int cutoff) {}
}

Here is a concrete implementation :

public class ArticleInMemoryStore : InMemoryStore<tbl_ana_Article, int>
{
    protected ArticleInMemoryStore(TimeSpan timeout, int cutoff) : base(timeout, cutoff){}

    protected override bool MemoryLimitExceeded(int cutoff)
    {
        using ( var context = ServiceLocator.ConnectionProvider.Instance<ArticleDataContext>())
        {
            int entries = context.tbl_ana_Articles.Count();
            return entries > cutoff;
        }
    }

    protected override void InternalRebuildLogic()
    {
        using (var context = ServiceLocator.ConnectionProvider.Instance<ArticleDataContext>())
        {
            var dictionary = context.tbl_ana_Articles.ToList().ToDictionary(item => item.ID, item => item);
            _preparingData = new ReadOnlyDictionary<int,tbl_ana_Article>(dictionary);
        }
    }
}

I would like to have feedback on possible issues related to memory-leak and race-condition that i possibly have overlooked in such scenario.

For example I am asking myself if using Interlocked.Exchange is better than just variable assignment as I did.

Skary
  • 1,322
  • 1
  • 13
  • 40
  • I didn't downvote, but I do think that the code in the question could be cleaned-up a bit. For example the empty method `​InternalRebuildLogic` and the constant property `MemoryLimitExceeded` could be removed, along with the `IInMemoryStore` interface and all the three-slash comments. You want to show us the minimal amount of code for which you need advice about its correctness. The more bells and whistles that accompany the code, the more likely that people will be overwhelmed and move on from the question. – Theodor Zoulias Aug 29 '23 at 16:58
  • 1
    @TheodorZoulias : I understand your suggestion, so I removed properties and simplified comments (few of them are useful for context). I let methods definition and concrete implementation because those are possibly involved in potential race-condition – Skary Aug 29 '23 at 17:52
  • 1
    It seems that `InternalRebuildLogic` uses `_preparingData` field, based on the provided code it seems unnecessary so I would recommend to just return the data from the `InternalRebuildLogic`. – Guru Stron Aug 29 '23 at 18:19
  • @GuruStron: yess indeed seems I had a little brain fart here, thanks to point it out I'll correct it. And regarding memory leaks/race conditions did you think the logic is safe? – Skary Aug 29 '23 at 18:45
  • TBH I don't see how memory leaks (in let's say "traditional" sense) are possible here. I would argue that the code is a bit overcomplicated but without having full overview of the task it is hard to tell. Who and when calls `Rebuild`? Why not just use in-memory cache+`Lazy` for example? – Guru Stron Aug 29 '23 at 18:52
  • @GuruStron I am just a bit scared due to unorthodox approach here, and afraid some details below up (eg some old reference accessed by the query method are kept around preventing GC collect old version of _readyData). But probably is just my nervousness. Regarding Lazy I didn't know it, seems a really good semplification. – Skary Aug 29 '23 at 20:01

1 Answers1

1

I can see a problem here:

protected ReadOnlyDictionary<K, T> _readyData = null;

The _readyData field is not volatile, and is accessed inside the Query method without synchronization or explicit memory barrier:

var dictionaryRef = _readyData;

The problem is that without a fence the .NET Jitter is allowed to reorder instractions in a way that would allow the thread running the Query method to see a partially initialzed ReadOnlyDictionary<K, T> object. The probability of this happening is small, and depending on the hardware on which the program runs it might be zero. If you don't like gambling, my suggestion is to declare the field as volatile.

You can find answers that explain in more details the danger associated with the lack of a memory barrier in this question: The need for volatile modifier in double checked locking in .NET.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • you are right, but i omitted the factory class that is the only that can istantiate InMemoryStore (due to protected constructor). Factory istantiate a specific store only once and with locking mechanism, so i think it may be considered safe. Regarding to `_readyData = _preparingData` did you think that may create trouble? doing that assignement while `Quey` is invoked – Skary Aug 30 '23 at 07:46
  • @Skary adding a barrier when you write the `_readyData` is not enough. You have to add a barrier when you read the `_readyData` also, otherwise you are not protected by the Jitter's reordering optimizations that might cause a reading thread to see corrupted state. The `lock` statement adds barriers when the `lock` is acquired and released, so I don't see a problem in the `_readyData = _preparingData` which is inside the `lock`-protected region. You might find this interesting: [Memory barrier generators](https://stackoverflow.com/questions/6581848/memory-barrier-generators). – Theodor Zoulias Aug 30 '23 at 08:06
  • my apologies if i am a bit pedant, but i am confused, `_readyData` is initialized while class is istantiated, and then no issue can arise because this section of code is locked. The `Query` section is not locked and may do reads while `Rebuild` is executing, but there, there is only an assignement. So in `Query` or you read the old data (reference still not updated) or you read the new data (reference updated), because reference assignemtn should be atomic. The fact you said it's still not enoguth let me think i miss something in the scenario i described above. What exactly? – Skary Aug 30 '23 at 08:29
  • @Skary the Jitter is allowed to reorder a list of instructions, provided that this reordering will not affect a single-thread program. So although your code first initializes the object and then does the atomic assignment, the Jitter might produce assembly code that does the assignment before initializing the object. That's the kind of reorderings that the `volatile` is designed to prevent. If you don't tell the Jitter that your field might be modified by multiple unsynchronized threads, it's not going to infer it be itself, and might do reorderings that you won't like to see. – Theodor Zoulias Aug 30 '23 at 08:57
  • 1
    So basically due to reordering, i can land to have method invokation on instance before instance creation (constructor)? Really surprising didn't know it was possible at all. – Skary Aug 30 '23 at 09:05
  • @Skary at that point you might have realized that deciding to go lock-free your are up for an adventure. The adventure might be long and scary, and involves learning about non-deterministic behaviors that won't be able to reproduce in the hardware you posses. Eric Lippert [has said](https://stackoverflow.com/questions/2528969/lock-free-multi-threading-is-for-real-threading-experts/2529773#2529773) that *"personally am not anywhere near smart enough to do correct low-lock programming beyond Interlocked.Increment"*. I think that he said it not because he is humble, but as a warning to others. – Theodor Zoulias Aug 30 '23 at 09:07
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/255109/discussion-between-skary-and-theodor-zoulias). – Skary Aug 30 '23 at 09:20