1

I have this static class

static class LocationMemoryCache
{
    public static readonly ConcurrentDictionary<int, LocationCityContract> LocationCities = new();
}

My process

  1. Api starts and initializes an empty dictionary
  2. A background job starts and runs once every day to reload the dictionary from the database
  3. Requests come in to read from the dictionary or update a specific city in the dictionary

My problem

If a request comes in to update the city

  1. I update the database
  2. If the update was successful, update the city object in the dictionary
  3. At the same time, the background job started and queried all cities before I updated the specific city
  4. The request finishes and the dictionary city now has the old values because the background job finished last

My solution I thought about first

Is there a way to lock/reserve the concurrent dictionary from reads/writes and then release it when I am done?

This way when the background job starts, it can lock/reserve the dictionary only for itself and when it's done it will release it for other requests to be used.

Then a request might have been waiting for the dictionary to be released and update it with the latest values.

Any ideas on other possible solutions?

Edit

  1. What is the purpose of the background job?

If I manually update/delete something in the database I want those changes to show up after the background job runs again. This could take a day for the changes to show up and I am okay with that.

  1. What happens when the Api wants to access the cache but its not loaded?

When the Api starts I block requests to this particular "Location" project until the background job marks IsReady to true. The cache I implemented is thread safe until I add the background job.

  1. How much time does it take to reload the cache?

I would say less then 10 seconds for a total of 310,000+ records in the "Location" project.

Why I chose the answer

I chose Xerillio's answer because it solves the background job problem by keeping track of date times. Similar to a "object version" approach. I won't be taking this path as I have decided that if I do a manual update in the database, I might as well create an API route that does it for me so that I can update the db and cache at the same time. So I might remove the background job after all or just run it once a week. Thank you for all the answers and I am ok with a possible data inconsistency with the way I am updating the objects because if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal

Edit 2

Let's imagine I have this cache now and 10,000 active users

static class LocationMemoryCache
{
    public static readonly ConcurrentDictionary<int, LocationCityUserLogContract> LocationCityUserLogs = new();
}

Things I took into consideration

  1. An update will only happen to objects that the user owns and the rate at which the user might update those objects is most likely once every minute. So that reduces the possibility of a problem by a lot for this specific example.
  2. Most of my cache objects are related only to a specific user so it relates with bullet point 1.
  3. The application owns the data, I don't. So I should never manually update the database unless it's critical.
  4. Memory might be a problem but 1,000,000 normalish objects is somewhere between 80MB - 150MB. I can have a lot of objects in memory to gain performance and reduce the load on the database.
  5. Having a lot of objects in memory will put pressure on Garbage Collection and that is not good but I don't think its bad at all for me because Garbage Collection only runs when memory gets low and all I have to do is just plan ahead to make sure there is enough memory. Yes it will run because of day to day operations but it won't be a big impact.
  6. All of these considerations just so that I can have an in memory cache right at my finger tips.
  • 1
    What's the purpose of the background job, if you update the dictionary whenever a change is made anyway? It sounds like the background job is redundant. – Xerillio Jul 10 '21 at 22:10
  • 1
    You've implemented a cache but it's apparently not a thread-safe, write-through cache. You could instead use the [Microsoft cache implementation](https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.caching.memory.memorycache?view=dotnet-plat-ext-3.1) but it's still not clear to me why you have *two* writers or what happens if the API wants to access the dictionary before the background task has loaded it. Better to have an API to get a City from a CityService and have that fetch from cache if it's there and if not, fetch from the database and add it to the cache. – Ian Mercer Jul 10 '21 at 22:16
  • 1
    How much time does it usually takes to reload the dictionary from the database? – Theodor Zoulias Jul 10 '21 at 22:30
  • Sounds like the background job is the problem. How about have a datetime of when a city is loaded into the cache and if the entry is too old upon some get, refresh it. – Caius Jard Jul 10 '21 at 23:40
  • Two more questions, that may help determine the optimal solution: 1) How frequently is the dictionary queried for cached entries? 2) How frequently is the dictionary modified, by having one of its entries replaced with an updated entry? I am asking for a rough estimation, not for precise numbers. – Theodor Zoulias Jul 10 '21 at 23:51
  • @TheodorZoulias 1. The dictionary is queried on these requests. Get city (Also used to validate if the city is valid internally), Search cities, Get the nearest city for coordinates. So its not too frequent but I don't want it to hit the database at all. 2. When I modify an entry I modify the values in the referenced object. I am not replacing it with another object. But the background job replaces it with new objects. – Alex Sapozhnikov Jul 10 '21 at 23:59
  • So the dictionary is not queried 10,000 times per second or so. This means that the main advantage of the `ConcurrentDictionary`, namely low contention under heavy usage, is not really utilized, and using a normal `Dictionary` protected with a `lock` would be pretty much equally performant, correct? – Theodor Zoulias Jul 11 '21 at 00:28
  • Btw modifying the properties of a referenced object stored in a `ConcurrentDictionary` requires extra synchronization, preferably using a dedicated locker per referenced object. The `ConcurrentDictionary` by itself does not protect the values it contains from concurrent access. Your objects are at risk of becoming corrupted by being modified by multiple threads concurrently. – Theodor Zoulias Jul 11 '21 at 00:28
  • @TheodorZoulias I only update the values after I do the update in the database so it is working as expected. I am not worried about locking the referenced object – Alex Sapozhnikov Jul 11 '21 at 00:51
  • Yeap, but if two threads attempt to update concurrently an object, there is no guarantee that the thread that updated the database last will also be the one that will update the object last. It is even possible that one thread will update half of the object's properties, and the other thread the other half, resulting to a frankenstein object. – Theodor Zoulias Jul 11 '21 at 01:09
  • *"if one route updates 2 specific values and another route updates 2 different specific values then the possibility of having a problem is very minimal"*: it depends on the type of the values. Take a look at [this](http://joeduffyblog.com/2006/02/07/threadsafety-torn-reads-and-the-like/). – Theodor Zoulias Jul 12 '21 at 08:15

2 Answers2

3

I would suggest adding a UpdatedAt/CreatedAt property to your LocationCityContract or creating a wrapper object (CacheItem<LocationCityContract>) with such a property. That way you can check if the item you're about to add/update with is newer than the existing object like so:

public class CacheItem<T>
{
    public T Item { get; }
    public DateTime CreatedAt { get; }
    // In case of system clock synchronization, consider making CreatedAt 
    // a long and using Environment.TickCount64. See comment from @Theodor
    public CacheItem(T item, DateTime? createdAt = null)
    {
        Item = item;
        CreatedAt = createdAt ?? DateTime.UtcNow;
    }
}

// Use it like...
static class LocationMemoryCache
{
    public static readonly
        ConcurrentDictionary<int, CacheItem<LocationCityContract>> LocationCities = new();
}

// From some request...
var newItem = new CacheItem(newLocation);
// or the background job...
var newItem = new CacheItem(newLocation, updateStart);

LocationMemoryCache.LocationCities
    .AddOrUpdate(
        newLocation.Id,
        newItem,
        (_, existingItem) =>
            newItem.CreatedAt > existingItem.CreatedAt
            ? newItem
            : existingItem)
    );

When a request wants to update the cache entry they do as above with the timestamp of whenever they finished adding the item to the database (see notes below).

The background job should, as soon as it starts, save a timestamp (let's call it updateStart). It then reads everything from the database and adds the items to the cache like above, where CreatedAt for the newLocation is set to updateStart. This way, the background job only updates the cache items that haven't been updated since it started. Perhaps you're not reading all items from DB as the first thing in the background job, but instead you read them one at a time and update the cache accordingly. In that case updateStart should instead be set right before reading each value (we could call it itemReadStart instead).

Since the way of updating the item in the cache is a little more cumbersome and you might be doing it from a lot of places, you could make a helper method to make the call to LocationCities.AddOrUpdate a little easier.

Note:

  • Since this approach is not synchronizing (locking) updates to the database, there's a race condition that means you might end up with a slightly out-of-date item in the cache. This can happen if two requests wants to update the same item simultaneously. You can't know for sure which one updated the DB last, so even if you set CreatedAt to the timestamp after updating each, it might not truly reflect which one was updated last. Since you're ok with a 24 hour delay from manually updating the DB until the background job updates the cache, perhaps this race condition is not a problem for you as the background job will fix it when run.
  • As @Theodor mentioned in the comments, you should avoid updating the object from the cache directly. Either use the C# 9 record type (as opposed to a class type) or clone the object if you want to cache new updates. That means, don't use LocationMemoryCache[locationId].Item.CityName = updatedName. Instead you should e.g. clone it like:
// You need to implement a constructor or similar to clone the object
// depending on how complex it is
var newLoc = new LocationCityContract(LocationMemoryCache[locationId].Item);
newLoc.CityName = updatedName;
var newItem = new CacheItem(newLoc);
LocationMemoryCache.LocationCities
    .AddOrUpdate(...); /* <- like above */
  • By not locking the whole dictionary you avoid having requests being blocked by each other because they're trying to update the cache at the same time. If the first point is not acceptable you can also introduce locking based on the location ID (or whatever you call it) when updating the database, so that DB and cache are updated atomically. This avoids blocking requests that are trying to update other locations so you minimize the risk of requests affecting each other.
Xerillio
  • 4,855
  • 1
  • 17
  • 28
  • 1
    Nice answer, upvoted. What makes me nervous is the dependency on the system clock. A potential system-wise clock adjustment, caused for example by the [daylight saving time](https://en.wikipedia.org/wiki/Daylight_saving_time), could mess up the caching scheme in an unpredictable way. I would prefer to keep track of the chronological order by using something like the [`Environment.TickCount64`](https://learn.microsoft.com/en-us/dotnet/api/system.environment.tickcount64) property. – Theodor Zoulias Jul 11 '21 at 15:18
  • Something else worth mentioning is that, according to this solution, the background job will not remove from the `ConcurrentDictionary` any entries that may have been deleted from the database during the previous 24 hours. I don't know if this is important. Probably isn't, considering that the entries are City-related, and cities are known to be long-lived entities. – Theodor Zoulias Jul 11 '21 at 15:25
  • 1
    @TheodorZoulias Good point regarding `Environment.TickCount64`, although I don't think daylight saving time should affect it, since you should be using UTC time, but clock synchronization could in rare cases. That is assuming `Environment.TickCount64` **isn't** affected by clock sync., which I can't confirm. Added comment to the code. It's true that removing entries can complicate it a little, but that depends on what his background job does, which isn't described in detail yet. – Xerillio Jul 11 '21 at 16:20
  • I stand corrected. Daylight saving time [does not occur in UTC](https://stackoverflow.com/questions/62151/datetime-now-vs-datetime-utcnow). – Theodor Zoulias Jul 11 '21 at 16:45
  • 1
    The `Environment.TickCount` [is not affected by adjustments of the system time](https://stackoverflow.com/questions/1858989/is-environment-tickcount-affected-by-system-time-adjustments/1859057#1859057), but according to this answer it is affected (in the range of milliseconds) by calls to the [`SetSystemTimeAdjustment`](https://learn.microsoft.com/en-us/windows/win32/api/sysinfoapi/nf-sysinfoapi-setsystemtimeadjustment) Win32 API method. – Theodor Zoulias Jul 11 '21 at 16:55
1

No, there is no way to lock a ConcurrentDictionary on demand from reads/writes, and then release it when you are done. This class does not offer this functionality. You could manually use a lock every time you are accessing the ConcurrentDictionary, but by doing so you would lose all the advantages that this specialized class has to offer (low contention under heavy usage), while keeping all its disadvantages (awkward API, overhead, allocations).

My suggestion is to use a normal Dictionary protected with a lock. This is a pessimistic approach that will result occasionally to some threads unnecessarily blocked, but it is also very simple and easy to reason about its correctness. Essentially all access to the dictionary and the database will be serialized:

  1. Every time a thread wants to read an object stored in the dictionary, will first have to take the lock, and keep the lock until it's done reading the object.
  2. Every time a thread wants to update the database and then the corresponding object, will first have to take the lock (before even updating the database), and keep the lock until all the properties of the object have been updated.
  3. Every time the background job wants to replace the current dictionary with a new dictionary, will first have to take the lock (before even querying the database), and keep the lock until the new dictionary has taken the place of the old one.

In case the performance of this simple approach proves to be unacceptable, you should look at more sophisticated solutions. But the complexity gap between this solution and the next simplest solution (that also offers guaranteed correctness) is likely to be quite significant, so you'd better have good reasons before going that route.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Perhaps I'm reading it incorrectly but it sounds like you're creating a big bottleneck on access to the DB (and cache). Even if that makes it easier to reason about it's not scalable and could easily decrease performance with a few simultaneous users. – Xerillio Jul 11 '21 at 10:58
  • @Xerillio yes, it could become a bottleneck under heavy usage. But the OP said in [a comment](https://stackoverflow.com/questions/68331666/is-there-a-way-to-lock-a-concurrent-dictionary-from-being-used/68332747?noredirect=1#comment120766855_68331666) that *"it's not too frequent"*, and in my answer I took this comment into consideration. – Theodor Zoulias Jul 11 '21 at 11:04
  • Fair enough, although "not too frequent" is a relative term and I'd argue that it'll have an impact even before what I would call "heavy usage". The OP should just consider if he really wants to put a hard upper limit on how many simultaneous users can use the system. – Xerillio Jul 11 '21 at 11:15
  • @Xerillio if most users just read the dictionary and don't update the database/dictionary, then the single `lock` approach shouldn't be a problem even with 1,000s of concurrent users. If on the other hand most users update the database and the dictionary, then sure, even a handful of concurrent users could bring this solution to its limits. – Theodor Zoulias Jul 11 '21 at 11:22
  • Agreed. I think considerations like this are relevant to put in your answer, so the OP (and others coming after) are properly informed about the limits. It's not a bad solution - it should just be used under the right circumstances. – Xerillio Jul 11 '21 at 11:26
  • @Xerillio I agree that this answer is currently lacking at stating its assumptions and limitations. What the OP wants is to have an actively updateable in-memory cache, which is not something that people request every day. For this reason I've not attempted to present a general use solution. I'll wait for the OP's feedback before working with this answer any further. – Theodor Zoulias Jul 11 '21 at 11:38