GCP Documentation - Task Queue bucket_size and rate

Question

I read a lot of articles and answers here about Google Task, my doubt is "rate" and "bucket_size" behavior.

I read this documentation: https://cloud.google.com/appengine/docs/standard/java/configyaml/queue

The snippet is:

Configuring the maximum number of concurrent requests

If using the default max_concurrent_requests settings are not sufficient, you can change the settings for max_concurrent_requests, as shown in the following example:

If your application queue has a rate of 20/s and a bucket size of 40, tasks in that queue execute at a rate of 20/s and can burst up to 40/s briefly. These settings work fine if task latency is relatively low; however, if latency increases significantly, you'll end up processing significantly more concurrent tasks. This extra processing load can consume extra instances and slow down your application.

For example, let's assume that your normal task latency is 0.3 seconds. At this latency, you'll process at most around 40 tasks simultaneously. But if your task latency increases to 5 seconds, you could easily have over 100 tasks processing at once. This increase forces your application to consume more instances to process the extra tasks, potentially slowing down the entire application and interfering with user requests.

You can avoid this possibility by setting max_concurrent_requests to a lower value. For example, if you set max_concurrent_requests to 10, our example queue maintains about 20 tasks/second when latency is 0.3 seconds. However, when the latency increases over 0.5 seconds, this setting throttles the processing rate to ensure that no more than 10 tasks run simultaneously.

queue:

# Set the max number of concurrent requests to 50

- name: optimize-queue
rate: 20/s
bucket_size: 40
max_concurrent_requests: 10

I understood that queue works like this:

The bucket is the unit that determine amount of tasks that are execute.

The rate is amount of bucket are fill to execute per period.

max_concurrent_requests is the max simultaneously can be executed.

This snippet here maybe strange:

But if your task latency increases to 5 seconds, you could easily have over 100 tasks processing at once. This increase forces your application to consume more instances to process the extra tasks, potentially slowing down the entire application and interfering with user requests.

Imagine that max_concurrent_requests is not setted. For me, it is impossible execute more than 100 tasks because the bucket_size is 40. For me, the low tasks would impact on time that tasks will be wait for a empty bucket.

Why the documentation said that tasks can have over 100?

if the bucket is 40, can more than 40 run simultaneously?

Edit

The bucket is fill up just the all tasks were executed or if some bucket is free in next rate will be increase? Example: 40 buckets are executing. 1 bucket finished. Imagine that each bucket spend more than 0.5 seconds and some bucket more than 1s. When 1 bucket is free, this will fill up in next second or the bucket wait all tasks finishing before bucket fill up again?

The issue is latency. If the queue is processing 20/s but each task takes 10s, then after 5s (without a `max_concurrent_requests` limit), it would be processing 100 tasks. Bucket size is defined more precisely [in the doc](https://cloud.google.com/appengine/docs/standard/java/configyaml/queue#defining-queues-and-rates) you link but seems to function as a burst limit. — klenwell, Nov 10 '17 at 06:12
For me the process is: Imagine that have 200 task queued and each tasks have 10s to execute (latency). 1 -> the bucket is empty 2 -> In the first second 20 tasks (rate) will be put in the bucket for execute. 3 -> In the second more 20 tasks (rate) will be put in the bucket for execute. 4 -> In the third second nothing (rate) will be put in the bucket because the bucket is full. So, in this case, do not have 100 tasks because the bucket is full and others tasks are waiting. — javaTry, Nov 10 '17 at 11:42
As I understand it, if you have 200 tasks queued and the bucket size is 40, in first second (t1) 40 tasks will start processing. At the same time 20 tokens will be added to the bucket. Thus, at t2, 20 tasks will be primed for processing. If there is no `max_concurrent_setting`, those 20 tasks would start processing and 20 more tokens would be added. If `max_concurrent_setting` is 10, nothing will happen because more than 10 processes are already in use. — klenwell, Nov 11 '17 at 04:01
Hi @klenwell, thanks for your explanation, I am still a little confuse. Here, I think that this author thinks like you: https://stackoverflow.com/questions/38754085/task-queue-of-size-1-for-serial-processing Here, I think that this author thinks different you, that the concept is just use token when slots is empty. https://stackoverflow.com/questions/28033610/gae-relationship-between-queue-rate-and-max-concurrent-requests/28034679#28034679 — javaTry, Nov 13 '17 at 17:57
Hi @klenwell, I did some tests and I noticed that you said it is ok. I can not mark your answer as correct. — javaTry, Dec 06 '17 at 00:28

score 2 · Accepted Answer · answered Dec 06 '17 at 04:34

Bucket size is defined more precisely in the doc you link, but one way to think of it is as a kind of initial burst limit.

Here's how I understand it would work, based on the parameters you provided in your question:

bucket_size: 40
rate: 20/s
max_concurrent_requests: 10

In the first second (t1) 40 tasks will start processing. At the same time 20 tokens (based on the rate) will be added to the bucket. Thus, at t2, 20 tasks will be primed for processing and another 20 tokens will be added to the bucket.

If there is no max_concurrent_setting, those 20 tasks would start processing. If max_concurrent_setting is 10, nothing will happen because more than 10 processes are already in use.

App Engine will continue to add tokens to the bucket at a rate of 20/s, but only if there is room in the bucket (bucket_size). Once there are 40 tokens in the bucket, it will stop until some of the running processes finish and there is more room.

After the initial burst of 40 tasks is finished, there should never be more than 10 tasks executing at a time.

GCP Documentation - Task Queue bucket_size and rate

Edit

1 Answers1