Impact of increased queue size in Marklogic task server

Question

what is the impact of increasing queue size from the default 100000 to 500000?

Is there a rule of thumb to calculate the reasonable maximum number of queue size using the available resources like RAM, CPU core and anything else needed to be considered.

Any suggestion would be appreciated.

score 3 · Answer 1 · answered Oct 04 '18 at 17:50

It won't impact CPU, but it will have limited impact on memory. The queue is kept in memory (e.g. it won't survive a server restart), and it needs to keep track of all arguments you are feeding into each of the tasks on the queue. If you feed each of them a list of 1 mln id's, that will take a lot more than feeding each only 10 or 1.

Also keep in mind that the Task Server has only limited threads (usually 16) to process the queue. Increasing that does affect CPU, and might have a much bigger impact on memory than increasing queue size. It is usually fairly safe to add a zero to the queue size number.

HTH!

score 3 · Answer 2 · edited Dec 06 '18 at 22:09

Once your queue size exceeds your thread concurrency, the impact is predominantly latency vs early error. The bigger the queue the bigger the latency between when you insert a 'job' and it gets to start. The bigger the 'cap' on queue size the longer before you actively notice a potentially huge latency. In some workloads a huge buffer and latency are fine, in others it's a disaster. Since it's not a priority queue (it's a fifo) it's not easy to get an 'emergency job' to the head of the line -- you have to wait it out or start writing specialized dispatching and queue management to fit your business needs. The 'pain point' is more often the inverse -- ask yourself what is the maximum amount of time acceptable between entering a job into the queue and it starting to run. Take that number, divide by average execution time for your jobs, divide by queue thread size (as long as its smaller then actual hardware concurrency) -- there's your maximum queue size.

Time to Start Job =  Current Length of queue  *  Time per job  / Number of (real) threads. 

Maximum (or average working) queue size == (Maximum Latency * threads ) / Time per job

Napkin Math: Assuming the following spherical elements:

Jobs = 100ms/each Thread Count = 16 (real cores) Max acceptable latency = 10 seconds
----------------------------- Queue Size Limit = 10*16/.1 = 1600

If your max latency is 10000 seconds then q=1600000
If your max latency is 1 sec then q=160

For a 2nd order factor -- as you push concurrency and load the chances of lock contention go up -- non-linearly (depending on your code details) -- look out for that. Keep an eye on the monitoring history for lock contention, io waits etc. Dropping the # of threads will likely do more good than dropping the queue size in this case.

Impact of increased queue size in Marklogic task server

2 Answers2