0

I have a service that posts data to web-server (asp.net core 3.1) each second that I store to a sql server using EF Core 3.1. Up til now I have, when trying to store new data, for each new data row separately:

  1. Checked if data entity exist in database (the entity type is configured with .IsUnique() index in the OnModelCreating() method)
  2. If not exists - add single entity
  3. DBContext.SaveChanges()

However, this seems like it is a bit "heavy" on the sql server with quite a lot of calls. It is running on Azure and sometimes it seems that the database has some problems following along and the web-server starts returning 500 (internal server error as far as I understand). This happens sometimes when someone calls another controller on the web-server and tries to retrieve some data (larger chunks) from the sql server. (that's perhaps for another question - about Azure SQL server reliability)

Is it better to keep a buffer on the web-server and save all in one go, like: DBContext.AddRange(entities) with a bit coarser time resolution (i.e. each minute)? I do not know exactly what happens if one or more of the data is/are duplicates? Are the ones not being duplicates stored or are all inserts refused? (I can't seem to find an explanation for this).

Any help on the matter is much appreciated.

EDIT 2021-02-08: I try to expand a bit on the situation:

outside my control: MQTT Broker(publishing messages)


in my control:

  • MQTT client (currently as an azure webjob), subscribes to MQTT Broker
  • ASP.NET server
  • SQL Database

The MQTT client is collecting and grouping messages from different sensors from mqtt broker into a format that (more or less) can be stored directly in the database.

The asp.net server acts as middle-man between mqtt client and sql database. BUT ALSO sends continuously "live" updates to anyone visiting the website. So currently the web-server has many jobs (perhaps the problem arises here??)

  • receive data form MQTT service
  • to store/retrieve data to/from database
  • serve visitors with "live" data from MQTT client as well as historic data from database

Hope this helps with the understanding.

Erik Thysell
  • 1,160
  • 1
  • 14
  • 31
  • Please share the entity configuration code. And, any specific reason why you can't test the behavior using `AddRange()` locally? – atiyar Feb 04 '21 at 00:00
  • https://stackoverflow.com/questions/55270535/ef-core-error-with-duplicated-key-while-adding-data-with-unique-index – Jason Pan Feb 04 '21 at 02:27
  • Read this post carefully, I think it should be useful to you, using `Transactions` can solve your problem. – Jason Pan Feb 04 '21 at 02:29
  • It is a case when EF CRUD model fails and why EF slowdown database. I can suggest fast solution using https://github.com/linq2db/linq2db.EntityFrameworkCore if you supply with information how many records you are trying to insert without duplicates. – Svyatoslav Danyliv Feb 04 '21 at 06:08
  • 1
    I think you probably need to figure out what is the root cause of the problems, instead of trying to fix it on this save operation. Maybe your sql server tier in azure is too limited for the load.. or maybe there's something else going on. Also since there's an index in place, why not just always add without checking? If it already exists you'll just receive an error and that's it. Also EF always execute your save under a transaction so you don't need to do that manually. – jpgrassi Feb 04 '21 at 19:35
  • You could also use some connection resiliency, like here: https://learn.microsoft.com/en-us/ef/core/miscellaneous/connection-resiliency. Also in azure you should be able to see what is happening with SQL Server. Check the CPU or memory usage to see.. – jpgrassi Feb 04 '21 at 19:37
  • 1
    @jpgrassi, root of the problem that you have to do that in transaction or left EF. At first you have to select IDs which are present in database - create big `IN` query, return that list to the client, filter out already present records, then `context.AddRange() `and `context.SaveChanges().` Everything takes time and transaction blocks others. – Svyatoslav Danyliv Feb 04 '21 at 20:02
  • @SvyatoslavDanyliv not sure I understand. From the OP question, every time a new row is posted to the server it needs to query first then if not exists insert. Like you said from the time you check until you insert it could already received another request and the insert will fail, so the check is useless anyway. Better leave it to SQL Server and handle the exception on the client (controller). If it would be an update operation, then you would also not need a transaction for read/update. You could set a column to be the concurrencytoken column. – jpgrassi Feb 06 '21 at 13:22

1 Answers1

0

I ended up with a buffer service with a ConcurrentDictionary that I use in my asp.net controller. That way I can make sure that duplicates are handled in my code in a controlled way (updates existing or discarded based on quality of the received data). Each minute I empty last minutes data to the database so that I always keep one minute of data. Bonus: I can also serve current data to visitors much more quickly from the buffer service instead of going to the database.

Erik Thysell
  • 1,160
  • 1
  • 14
  • 31
  • 1
    Bad idea. If you have IIS there is application pool. And each will have it's own ConcurrentDictionary. Also if you start several servers under Load Balances - you will have the same problem. – Svyatoslav Danyliv Feb 06 '21 at 12:39
  • Yeah I don't think this is a good thing. What you could do is use a queue, so you would just enqueue things and those are handled later on. That solves the issue of horizontal scaling and no "sticky sessions". – jpgrassi Feb 06 '21 at 13:20
  • @SvyatoslavDanyliv thanks for your comment. My service and webserver has so far very limited scale and usage (B1 plan). Is there still a pool of parallel applications? – Erik Thysell Feb 08 '21 at 11:24
  • @jpgrassi - thanks for your comment as well. It seems a bit tricky. A service bus queue seems only to be able to be able to send each message once. I would like to be able to serve the same (recent) messages multiple times (different visitors/users) without the hassle of going to/from database. Azure Event Grid seems like it quickly becomes expensive. But I do not know if there are any other suitable options. – Erik Thysell Feb 08 '21 at 11:59