Once your queue size exceeds your thread concurrency, the impact is predominantly latency vs early error. The bigger the queue the bigger the latency between when you insert a 'job' and it gets to start. The bigger the 'cap' on queue size the longer before you actively notice a potentially huge latency. In some workloads a huge buffer and latency are fine, in others it's a disaster. Since it's not a priority queue (it's a fifo) it's not easy to get an 'emergency job' to the head of the line -- you have to wait it out or start writing specialized dispatching and queue management to fit your business needs. The 'pain point' is more often the inverse -- ask yourself what is the maximum amount of time acceptable between entering a job into the queue and it starting to run. Take that number, divide by average execution time for your jobs, divide by queue thread size (as long as its smaller then actual hardware concurrency) -- there's your maximum queue size.
Time to Start Job = Current Length of queue * Time per job / Number of (real) threads.
Maximum (or average working) queue size == (Maximum Latency * threads ) / Time per job
Napkin Math: Assuming the following spherical elements:
Jobs = 100ms/each Thread Count = 16 (real cores) Max acceptable latency = 10 seconds
----------------------------- Queue Size Limit = 10*16/.1 = 1600
If your max latency is 10000 seconds then q=1600000
If your max latency is 1 sec then q=160
For a 2nd order factor -- as you push concurrency and load the chances of lock contention go up -- non-linearly (depending on your code details) -- look out for that. Keep an eye on the monitoring history for lock contention, io waits etc.
Dropping the # of threads will likely do more good than dropping the queue size in this case.