0

I'm having the following setup:

  1. mysql + php
  2. Standard symfony 5 setup
  3. Standard doctrine orm usage
  4. High traffic

With the raise of traffic I keep encountering the following issue more and more (used to be like 1 time per month, now it got to > 50 per day:

2 requests that would require inserting into the same table the same unique value (lets say the same PK so I can simplify the sample code). My code looks like this:

$existingEntity = $tableRepository->find($id);
if (!$existingEntity) {
   $newEntity = new EntityName();
   $newEntity->setId($id);
   $em->persist($newEntity);
   $em->flush();
}

The second request fails with entity manager closed (because mysql returns a duplicate key error). So basically both requests run the check in the same time, both created the entity and persisted it.

Now obviously, the slower the network and the more complicated the code until that particular part of the script, the higher the chance of that to happen. But I optimized everything I could think of, and moved the check as the first thing in the script.

I was thinking to make some kind of lock table in a redis or something but that would basically just move the problem from a place to another. How can I approach this issue?

Edit:

Example without pk:

$existingEntity = $tableRepository->findBy(['columnWithUniqueConstraint' => $value]);
if (!$existingEntity) {
   $newEntity = new EntityName();
   $newEntity->setColumnWithUniqueContraint($value);
   $em->persist($newEntity);
   $em->flush();
}
oguz ismail
  • 1
  • 16
  • 47
  • 69
zozo
  • 8,230
  • 19
  • 79
  • 134
  • Why don't you use suto increment as a primary key? – Shadow Jan 30 '21 at 11:42
  • @Shadow OP states that PK is purely for demonstrative purposes. – El_Vanja Jan 30 '21 at 11:43
  • Locks would be a good way to deal with this. That's what we do to deal with concurrency where I work (where the setup is very similar to what you've described). – El_Vanja Jan 30 '21 at 11:45
  • @El_Vanja auto increment's pure purpose is to generate a unique value with minimal impact on a table, so it does satisfy the criteria that is described in the question. If the OP oversimplified the problem, then the OP should explain how those unique values are generated and what attributes ir should have. – Shadow Jan 30 '21 at 11:50
  • Sorry for slow answer. First, I used pk to simplify the example, and second you may have for example a natural pk. For some reason ppl nowadays always forgot about those and their advantages / disadvantages but that's a whole discussion :). – zozo Jan 30 '21 at 11:53
  • @Shadow The way I understood the question, this is about a column with a unique constraint. Not something that the DBMS needs to generate. In that case, it is irrelevant where this value comes from. I could be wrong, though. Up to the OP to clarify. – El_Vanja Jan 30 '21 at 11:54
  • @El_Vanja you understood the case right. What kind of locks are you thinking of? Do you have any example? In a self contained system you would use semaphors but have no idea of their equivalent in this kind. – zozo Jan 30 '21 at 11:54
  • Look, you need to explain your issue in a greater detail. Locking an entire table in a high-concurrency application will probably solve this particular problem at the expense of severe performannce degradation. – Shadow Jan 30 '21 at 11:57
  • @zozo you need to explain how the unique values are generated. – Shadow Jan 30 '21 at 11:58
  • I'm gonna go out on a limb here and say user input. But yeah, if you give more context, maybe there's optimization to be done elsewhere in the process. – El_Vanja Jan 30 '21 at 12:02
  • @Shadow Added an example without pk. The way the keys are generated are beside the point. You can have something as simple as a product sku. A company name. Literally anything that may come from an outside system and in yours needs to be unique. Also this issue is not only about duplicate keys. Imagine you take out money from a bank account. Resource races could make that account to go negative even if you check it has enough money (I'm not suggesting using mysql for banks, it is another example). – zozo Jan 30 '21 at 12:05
  • Locking the whole table for the duration of the requests is not a great solution (but it is a solution) because 99.999% of the other requests have nothing to do with the one with the problem (basically I would slow down 99.999% of other request for a particular case). – zozo Jan 30 '21 at 12:06
  • Do these duplicate entries present a serious problem for the way your application functions? – El_Vanja Jan 30 '21 at 12:07
  • As for context, is actually extremely complex (but it does boil down to user input in the end)... user requests data about a specific product that comes from an outside system. I only keep products that I normally show. Users have access to the outer system and can request data about products that are not in my usual list. When that happen a request is sent to pull the product. What happens is that ppl happen to pull same product twice in the same time. We are talking about millions of requests. Chances of 2 being pull in the same second are medium to high. – zozo Jan 30 '21 at 12:10
  • @El_Vanjayes yes, they need to be unique. And they actually remain unique through the process is just that one request fails because the check said the entry is not there. I have code that handles the entry exists case gracefully. But is not executed in 0.001 of cases because the check returns the wrong result. I'm thinking I should catch the requests somewhere and deduplicate them by that key. – zozo Jan 30 '21 at 12:14
  • I think I found something. I have query cache active. Normally is deleted when a row is updated but maybe the timing being so close the db doesn't have time to bust the cache. I will try to add a no cache and write if that works. But since it normally takes hours for the issue to happen... that will take a while :). – zozo Jan 30 '21 at 12:19
  • Your bank account comparison does not stand, since transactional amounts can be the same. Such simple concurrency tasks are taken care of by innodb's default row-level locks.. This issue of not having a piece of data in your table cannot be handled by row level locks. The main problem seems to be that you do the duplicate check with a select. You should be doing it with an insert instead. If you can inert the product id into your table, it means it did not exist there, so you need to go to the external system. If the insertion fails with a duplicate key error, then the product is in your db. – Shadow Jan 30 '21 at 13:11
  • @Shadow I agree that it should actually be an INSERT - ON DUPLICATE KEY UPDATE. but unfortunately doctrine doesn't support that (I'm not the only one with that problem - https://stackoverflow.com/questions/4555323/doctrine-on-duplicate-key-update). Going native is a solution for a few particular cases but I wanted to find a more... general solution. I'm gonna try the no cache solution today (although that is also rather particular and I don't like it since I don't want to disable globaly). – zozo Feb 01 '21 at 11:10
  • Yeah, this is the problem with abstractions. They leak. :) – Shadow Feb 01 '21 at 11:48

0 Answers0