2

Quick Background: As I am going back and redesigning some critical parts of an application, I keep wondering about locking and its impact on performance. The app has a large Tree style data structure which caches data/DTO from the database. Updates to the large tree can come about in two main ways: 1. user triggered commands, 2. auto updates from jobs that ran in the background.

When either operation type occurs (user/auto), I am locking down (explicitly locking) the data structure. I was running into consistency issues, so locking down everything seemed to make the most sense to protect the integrity of the data in the cache.

Question: Since many auto updates can occur at once I was thinking of implementing some kind of queue (JMS maybe) to handle instructions to the data structure, where any user driven updates get pushed to the top and handled first. When it comes to handling a bulk/unknown size set of auto "tasks", I am trying to figure out if I should let them run and lock individually or try and bulk them together by time and interact with locking once. The real crux of the problem is that any one of the tasks to update could affect the entire tree.

In terms of overall performance (general, nothing specific), is it more efficient to have many transactions locking potentially doing large updates, or try and combine to one massive bulk update and only lock once but for a lot longer? I know a lot of this probably hinges on the data, the type of updates, frequency, etc. I didn't know if there was a general rule of thumb of "smaller more frequent locks" or "one large potentially longer" lock.

Walls
  • 3,972
  • 6
  • 37
  • 52
  • Yes. Hmm. Maybe no. Maybe maybe. OK, seriously: I think your question is too broad. – GhostCat Aug 07 '17 at 13:08
  • @GhostCat I had an inkling it might be a little too broad. Can you think of some way I might be able to narrow it a bit? I have no problem closing my own post down if it is def. too broad. – Walls Aug 07 '17 at 13:15
  • What is the cache policy? Are objects hanging around longer than needed, increasing the tree size, and impacting performance? – Andrew S Aug 07 '17 at 13:29
  • @AndrewS right now it isn't truly a cache, it's a custom data structure in memory. I am planning on migrating it into an Ehcache or Hazelcast type data structure to give it transacational. – Walls Aug 07 '17 at 13:30
  • So if each object within the cache is independent (not part of a tree), then only that particular object would need to be locked? – Andrew S Aug 07 '17 at 13:38
  • @AndrewS the cache is just another way to represent the tree. When a child element gets updated a calculation occurs up the tree recursively. This touches a lot of the objects by nature, which means the entire tree needs to be wrapped/locked. – Walls Aug 07 '17 at 13:57

2 Answers2

1

If you end up implementing some kind of a queue, then you lose all concurrency. If you get 1000 requests at once, think of how inefficient that is.

Try taking a look at this code for concurrent trees. https://github.com/npgall/concurrent-trees

BaneDad
  • 157
  • 13
  • Allowing fewer concurrently active connections will improve overall performance most of the time. It always depends, of course. https://stackoverflow.com/questions/1208077/optimal-number-of-connections-in-connection-pool – daniu Aug 07 '17 at 13:13
  • The problem is the data itself is going to be within a global transaction touching a database in addition to the cache tree. A simple concurrent data structure won't add that transactional benefit. – Walls Aug 07 '17 at 13:29
1

I think the answer depends on whether your program spends any significant time with the data structure unlocked. If it does not, I recommend locking once for all pending updates.

The reason is, that other threads that may be waiting for the lock may get woken up and then uselessly sent back to sleep when the update thread quickly locks the resource again. Or the update is interrupted by another thread which is likely bad for cache utilization. Also there is a cost to locking which may be small compared to your update: pipelines may have to be flushed, memory accesses may not be freely reordered, etc.

If the thread spends some time between updates without having to lock the data structure, I would consider relocking for every update if it is expected that other threads can complete their transactions inbetween and contention is thereby reduced.

Note that when there are different priorities for different updates like I presume for your user updates versus the background updates, it may be a bad idea to lock down the data structure for a long time for lower priority updates if this could in any way prevent higher priority tasks from running.

PaulR
  • 3,587
  • 14
  • 24