1

I have an API that people are calling and I have a database containing statistics of the number of requests. All API requests are made by a user in a company. There's a row in the database per user per company per hour. Example:

| CompanyId | UserId| Date             | Requests |
|-----------|-------|------------------|----------|
| 1         | 100   | 2020-01-30 14:00 | 4527     |
| 1         | 100   | 2020-01-30 15:00 | 43       |
| 2         | 201   | 2020-01-30 14:00 | 161      |

To avoid having to make a database call on every request, I've developed a service class in C# maintaining an in-memory representation of the statistics stored in a database:

public class StatisticsService
{
    private readonly IDatabase database;
    private readonly Dictionary<string, CompanyStats> statsByCompany;
    private DateTime lastTick = DateTime.MinValue;

    public StatisticsService(IDatabase database)
    {
        this.database = database;
        this.statsByCompany = new Dictionary<string, CompanyStats>();
    }

    private class CompanyStats
    {
        public CompanyStats(List<UserStats> userStats)
        {
            UserStats = userStats;
        }

        public List<UserStats> UserStats { get; set; }
    }

    private class UserStats
    {
        public UserStats(string userId, int requests, DateTime hour)
        {
            UserId = userId;
            Requests = requests;
            Hour = hour;
            Updated = DateTime.MinValue;
        }

        public string UserId { get; set; }
        public int Requests { get; set; }
        public DateTime Hour { get; set; }
        public DateTime Updated { get; set; }
    }
}

Every time someone calls the API, I'm calling an increment method on the StatisticsService:

public void Increment(string companyId, string userId)
{
    var utcNow = DateTime.UtcNow;
    EnsureCompanyLoaded(companyId, utcNow);

    var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);

    var stats = statsByCompany[companyId];
    var userStats = stats.UserStats.FirstOrDefault(ls => ls.UserId == userId && ls.Hour == currentHour);
    if (userStats == null)
    {
        var userStatsToAdd = new UserStats(userId, 1, currentHour);
        userStatsToAdd.Updated = utcNow;
        stats.UserStats.Add(userStatsToAdd);
    }
    else
    {
        userStats.Requests++;
        userStats.Updated = utcNow;
    }
}

The method loads the company into the cache if not already there (will publish EnsureCompanyLoaded in a bit). It then checks if there is a UserStats object for this hour for the user and company. If not it creates it and set Requests to 1. If other requests have already been made for this user, company, and current hour, it increments the number of requests by 1.

EnsureCompanyLoaded as promised:

private void EnsureCompanyLoaded(string companyId, DateTime utcNow)
{
    if (statsByCompany.ContainsKey(companyId)) return;
    var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0); ;

    var userStats = new List<UserStats>();
    userStats.AddRange(database.GetAllFromThisMonth(companyId));

    statsByCompany[companyId] = new CompanyStats(userStats);
}

The details behind loading the data from the database are hidden away behind the GetAllFromThisMonth method and not important to my question.

Finally, I have a timer that stores any updated results to the database every 5 minutes or when the process shuts down:

public void Tick(object state)
{
    var utcNow = DateTime.UtcNow;
    var currentHour = new DateTime(utcNow.Year, utcNow.Month, utcNow.Day, utcNow.Hour, 0, 0);

    foreach (var companyId in statsByCompany.Keys)
    {
        var usersToUpdate = statsByCompany[companyId].UserStats.Where(ls => ls.Updated > lastTick);
        foreach (var userStats in usersToUpdate)
        {
            database.Save(GenerateSomeEntity(userStats.Requests));
            userStats.Updated = DateTime.MinValue;
        }
    }

    // If we moved into new month since last tick, clear entire cache
    if (lastTick.Month != utcNow.Month)
    {
        statsByCompany.Clear();
    }

    lastTick = utcNow;
}

I've done some single-threaded testing of the code and the concept seem to work out as expected. Now I want to migrate this to be thread-safe but cannot seem to figure out how to implement it the best way. I've looked at ConcurrentDictionary which might be needed. The main problem isn't on the dictionary methods, though. If two threads call Increment simultaneously, they could both end up in the EnsureCompanyLoaded method. I know of the concepts of lock in C#, but I'm afraid to just lock on every invocation and slow down performance that way.

Anyone needed something similar and have some good pointers in which direction I could go?

ThomasArdal
  • 4,999
  • 4
  • 33
  • 73
  • To do it thread safely, you can use a lock on updating operations, anyway you should avoid all that thing you are doing. Your DB can handle that information storing, no reason to overcomplicate things. – Marco Salerno Jan 30 '20 at 13:51
  • That depends on the database. Some databases are optimized for writes. Other for reads. In this case, I want to keep it in memory if possible – ThomasArdal Jan 30 '20 at 13:59
  • What database are you using? If it is SQL Server, have you looked at [Memory-Optimized Tables](https://learn.microsoft.com/en-us/sql/relational-databases/in-memory-oltp/introduction-to-memory-optimized-tables)? – Theodor Zoulias Jan 30 '20 at 15:09
  • It's Elasticsearch. It has scripted updates as well. But would like to avoid constantly writing if possible. – ThomasArdal Jan 30 '20 at 15:31

3 Answers3

1

When keeping counters in memory like this you have two options:

  1. Keep in memory the actual historic value of the counter
  2. Keep in memory only the differential increment of the counter

I have used both approaches and I've found the second to be simpler, faster and safer. So my suggestion is to stop loading UserStats from the database, and just increment the in-memory counter starting from 0. Then every 5 minutes call a stored procedure that inserts or updates the related database record accordingly (while zero-ing the in-memory value). This way you'll eliminate the race conditions at the loading phase, and you'll ensure that every call to Increment will be consistently fast.

For thread-safety you can use either a normal Dictionary with a lock, or a ConcurrentDictionary without lock. The first option is more flexible, and the second more efficient. If you choose Dictionary+lock, use the lock only for protecting the internal state of the Dictionary. Don't lock while updating the database. Before updating each counter take the current value from the dictionary and remove the entry in an atomic operation, and then issue the database command while other threads will be able to recreate the entry again if needed. The ConcurrentDictionary class contains a TryRemove method that can be used to achieve this goal without locking:

public bool TryRemove (TKey key, out TValue value);

It also contains a ToArray method that returns a snapshot of the entries in the dictionary. At first glance it seems that the ConcurrentDictionary suits your needs, so you could use it as a basis of your implementation and see how it goes.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • That is really good input. What I probably should have mentioned is that I'm also asking the service for totals. Both total per user and total per company. So, I need the full state in memory when asking. – ThomasArdal Jan 30 '20 at 14:55
  • Hmm, so your service serves as a caching mechanism as well. Don't you have other caching needs beyond the number of requests per user and company? If you do, there are specialized mechanisms for this purpose, like the [MemoryCache](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.caching.memorycache) class. – Theodor Zoulias Jan 30 '20 at 15:15
  • Yep, I could definitely use MemoryCache instead of the dictionary. Still have to maintain it, though. And write changes to the database. – ThomasArdal Jan 30 '20 at 15:32
  • 1
    I went with solution #2. It is writes that causes the problems here. So, I'm keeping an in-memory list of stats. Every minute (maybe 5 minutes), I write the stats to the database. The client that needs the stats, fetches them from the database and adds the stats from the in-memory list. For this code, I could probably only read from the database. Results would be max 1 minute old – ThomasArdal Jan 31 '20 at 12:11
-1

In general, to make your code thread-safe:

  • Use concurrent collections, such as ConcurrentDictionary

  • Make sure to understand concepts such as lock statement, Monitor.Wait and Mintor.PulseAll in tutorials. Locks can be slow if IO operations (such as disk write/read) it being locked on, but for something in RAM it is not necessary to worrry about. If you have really some lengthy operation such as IO or http requests, consider using ConcurrentQueue and learn about the consumer-producer pattern to process work in queues by many workers (example)

  • You can also try Redis server to cache database without need to design something from zero.

  • You can also make your service singleton, and update database only after value changes. For reading value, you have already stored it in your service.

FindOutIslamNow
  • 1,169
  • 1
  • 14
  • 33
  • Using ConcurrentDictionary doesn't solve the problem as I wrote in the question. Also, I do understand `lock` and the features around it. Asking for alternatives to avoid locking everything. I know Redis is fast, but for this code I want it in memory. My service is already singleton and it stores the database in the database using a timer. – ThomasArdal Jan 30 '20 at 13:47
  • @ThomasArdal I can get you are afraid to slow down execution. Using the last point in my answer, plus an update timer, will never make your code slow. Just make sure to lock read/writes on same object. The company should have been loaded only first time and after that it will not take time. You can also load all companies in the constructor of service once. – FindOutIslamNow Jan 30 '20 at 13:54
-1

To avoid having to make a database call on every request, I've developed a service class in C# maintaining an in-memory representation of the statistics stored in a database:

If you want to avoid Update race conditions, you should stop doing exactly that.

Databases by design, by purpose prevent simple update race conditions. This is a simple counting-up operation. A single DML statement. Implicity protected by transactions, journaling and locks. Indeed that is why calling them a lot is costly.

You are fighting the concurrency already there, by adding that service. You are also moving a DB job outside of the DB. And Moving DB jobs outside of the DB, is just going to cause issues.

If your worry is speed:

  1. Please read the Speed Rant.
  2. Maybe a Dsitributed Database Design is the droid you are looking for? They had a massive surge in popularity since Mobile Devices have proliferated, both for speed and reliability reasons.
Christopher
  • 9,634
  • 2
  • 17
  • 31
  • The race condition is in the C# code. I do not worry about race conditions in the database. As you write, it has transactions, etc. to handle these things. I can easily create code that increments the number in SQL or similar. But I want this in memory to avoid having to update the database all the time. – ThomasArdal Jan 30 '20 at 13:45
  • @ThomasArdal The race condition is in your code, because you **circumvented** the concurrency control already there. The one inherently in the DB | It is relatively trivial to add a bottleneck, using the lock statement: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/lock-statement | Locking over multiple app domains would be harder, but afaik there are options too. But all you would be doing is *reinvent* the concurrency control *already in place* in the DB. That is moving a DB job out of the DB, wich is not generally a adviseable action. – Christopher Jan 30 '20 at 14:20