0

I'm going to use a static collection that will be used for reading by the core process and fully updated every X mins by the background service.

The background process will load updated data from the database every X mins and set the received dataset into this static collection.

The core process will receive many tasks to check if some values exist in this collection. Each task will be processed in a separate thread. There will be a lot of requests, it should be extremely fast, so I can't ask database for each request and I need an updateable list in memory.

public class LoadedData
{
    public static HashSet<string> Keys { get; set; }
}

public class CoreProcess
{
    public bool ElementExists(string key)
    {
        return LoadedData.Keys.Contains(key);
    }
}

public class BackgroundProcess
{
    public async Task LoadData()
    {
        while (true)
        {
            LoadedData.Keys = GetKeysFromDb();
            await Task.Delay(TimeSpan.FromMinutes(5));
        }
    }
}

So, I'm looking for the best solution for this. I was thinking about using HashSet<T> because I'm sure that each element in the collection will be unique. But HashSet<T> is not thread-safe. So I started considering BlockingCollection<T>, ConcurrentBag<T>, ConcurrentDictionary<T, byte>, but then I wondered if I needed a thread-safe collection here at all. Looks like not, because I'm not going to add/update/remove particular elements in the collection. Only full rewrite from the database.

  1. So, does it mean that I can just use simple HashSet<T>?

  2. Which collection would you use to solve it?

  3. And in general, will there be any issues with a simultaneous reading by the core process and full overwriting of the collection by the background process?

Peter Csala
  • 17,736
  • 16
  • 35
  • 75
  • this might be a better question for the SoftwareEngineering stackexchange – Nigel Jun 02 '22 at 23:43
  • `HashSet` will be fine, as long as you replace the hashset rather than alter it. `HashSet` isn't thread-safe if a writer is involved - but in your code no writing is involved when reading is occurring. – mjwills Jun 03 '22 at 00:40
  • @mjwills thanks. Could you provide code example how you suggest to replace hashset instead of alter? Didn't get this... – Alex Nevajniy Jun 03 '22 at 00:48
  • Exactly what you are doing now - `LoadedData.Keys = something`. – mjwills Jun 03 '22 at 05:27
  • @mjwills sorry, I thought you mean something another. Thanks anyway! – Alex Nevajniy Jun 03 '22 at 06:55

1 Answers1

1

So the HashSet<string> becomes effectively immutable as soon as it becomes the value of the LoadedData.Keys property. In this case your code is almost OK. The only missing ingredient is to ensure the visibility of this property by all threads involved.

In theory it is possible that the compiler or the jitter might use a cached/stale value of the property, instead of looking what is currently stored in the main memory. In practice you might never experience this phenomenon, but if you want to play by the rules you must read and write to this property with volatile semantics. If the Keys was a field, you could just decorate it with the volatile keyword. Since it's a property, you must do a bit more work:

public class LoadedData
{
    private volatile static HashSet<string> _keys;

    public static HashSet<string> Keys
    {
        get => _keys;
        set => _keys = value;
    }
}

...or using the Volatile class instead of the volatile keyword:

public class LoadedData
{
    private static HashSet<string> _keys;

    public static HashSet<string> Keys
    {
        get => Volatile.Read(ref _keys);
        set => Volatile.Write(ref _keys, value);
    }
}

A final cautionary note: The immutability of the HashSet<string> is not enforced by the compiler. It's just a verbal contract that you make with your future self, and with any other future maintainers of your code. In case some mutative code find its way to your code-base, the behavior of your program will become officially undefined. If you want to guard yourself against this scenario, the most semantically correct way to do it is to replace the HashSet<string> with an ImmutableHashSet<string>. The immutable collections are significantly slower than their mutable counterparts (typically at least 10x slower), so it's a trade-off. You can have peace of mind, or ultimate performance, but not both.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 1
    making field of type `IReadOnlySet` and store `HashSet` there (instead of `ImmutableHashSet`) would almost get to "the both"... but indeed one can still shot themselves in the foot by manually casting it back to `HashSet`. – Alexei Levenkov Jun 03 '22 at 01:56
  • 1
    This is basically the [RCU](https://lwn.net/Articles/262464/) update pattern, except C#'s GC makes the deallocation problem trivial. And there's no "read/copy" involved, as the new data structure is written fresh. But atomically replacing a reference to a read-only data structure is still there as the key building block. So readers can just deref it, not needing any readers/writers locking. – Peter Cordes Jun 03 '22 at 04:47
  • Thanks for the answers. IReadOnlySet is accessible starting from the .NET 5, but I'm working with the .Net Core 3.1. Since ImmutableHashSet is slower than HashSet it doesn't work for me. Based on your answers I think here I can use something like this https://codeshare.io/N3QXkr It seems to me that this should be safe enough from the point of immutability (doesn't allow to change the elements of the collection, only overwrite whole collection) and from the point of thread-safety. So in this case I even don't need to use Volatile.Read/Write accessors. Right? – Alex Nevajniy Jun 03 '22 at 06:42
  • @AlexeiLevenkov to make shooting yourself in the foot even more difficult, one could consider wrapping the `HashSet` in a custom `IReadOnlySet` implementation. Like [this](https://stackoverflow.com/questions/36815062/c-sharp-hashsett-read-only-workaround/72485595#72485595 "C# HashSet read-only workaround") one for example. Although the simple approach of just changing the type of the property is already pretty safe IMHO. – Theodor Zoulias Jun 03 '22 at 06:43
  • @AlexNevajniy if the `LoadedData.Keys` is going to be accessed by more than one threads, then the `volatile` is needed. It's the most lightweight way to add the required half fence there. The alternative is to use the `lock`, on both reads and writes. Unless you don't want to play by the book, and you prefer to become an [expert in memory models](https://stackoverflow.com/questions/66488734/should-interlocked-compareexchange-also-a-volatile-variable/66490395#66490395), so that you can predict exactly how your lock-free code will behave on your target CPU architecture! – Theodor Zoulias Jun 03 '22 at 07:00
  • @TheodorZoulias so can I just use private static volatile HashSet _keys in the example above? I just want to prevent the ability to change the items in the collection (add/update/delete) and leave only a full refresh of the collection (RefreshKeys method). – Alex Nevajniy Jun 03 '22 at 07:20
  • @AlexNevajniy yes, making the `_keys` `volatile` is perfectly fine. Your code will be as good as it gets. – Theodor Zoulias Jun 03 '22 at 07:24