We had performance issues in our app due to one by one updates of entities in a DB table where the number of rows was high (more than a million). We kept getting deadlock victim errors so it was obvious that the table was locking rows longer than it should have.
Currently, we implemented a manual batching of configurable limit/time threshold of stored procedure calls to update entities in the DB.
The simplified class would look something like this:
public class EntityBatchUpdater
{
private readonly IRepository _repository;
private List<Entity> _batch = new List<Entity>();
private readonly Timer _batchPostingTimer;
private readonly int _batchSize;
private static readonly object _batchPostingLock = new object();
public EntityBatchUpdater(IRepository repository)
{
_repository = repository;
_batchSize = 1000; // configurable
string batchPostingPeriod = "00:01:00.00"; // configurable
_batchPostingTimer = new Timer
{
Interval = TimeSpan.Parse(batchPostingPeriod).TotalMilliseconds,
Enabled = true,
AutoReset = true,
};
_batchPostingTimer.Elapsed += OnTimedEvent;
}
public void Update(Entity entity)
{
try
{
lock (_batchPostingLock)
{
_batch.Add(entity);
if (_batch.Count == _batchSize)
{
EntityBatchUpdate();
}
}
}
catch (Exception ex)
{
_logger.LogError(ex, $"Failed to insert batch {JsonConvert.SerializeObject(batch)}");
}
}
private void EntityBatchUpdate()
{
if (_batch.Count == 0)
{
return;
}
try
{
var entityBatchXML = SerializePayload();
_repository.BatchUpdate(entityBatchXML);
_batch.Clear();
}
catch (Exception ex)
{
_logger.LogError(ex, $"Failed to insert batch; batchSize:{_batch.Count}");
}
}
private string SerializePayload()
{
using (var sw = new System.IO.StringWriter())
{
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
ns.Add("", "");
var serializer = new XmlSerializer(typeof(List<Entity>),
new XmlRootAttribute("ENTITY_BATCH"));
serializer.Serialize(sw, _batch, ns);
return sw.ToString();
}
}
private void OnTimedEvent(object source, ElapsedEventArgs e)
{
EntityBatchUpdate();
}
}
Currently, we took advantage of SQL Server's fast XML processing and serialize payload into XML when calling the actual procedure to avoid hitting the DB with a lot of calls. I also thought about creating a Table Valued Param to serialize the data we need to send to the proc, but I don't think that would drastically improve the performance.
My question is: How did you handle great load like this on your DB? 1.) Did you use a nuget package/some other tool to handle batching for you? 2.) Did you solve this using some other practice?
Edit: To give a bit more insight: We are currently processing a queue manually in our app (hence the huge number of updates), and we want to do it as fast and as reliable as possible. We will move to a better queueing mechanism in the future (RabbtiMQ or Kafka), but in the meantime, we want to have a standard approach of consuming and processing queues from a DB table.