0

It seems that when app engine taskqueue's get interrupted, they take 20 minutes or more to restart, is this behavior normal?

I am using the TaskQueue on Google Cloud's App Engine Flexible system. I regularly add tasks to the taskqueue and they get processed on the system. It appears that occasionally, the task gets interrupted in the middle of what it's doing. I don't know why this happens, but I assume it's probably because the instance that its on restarted itself.

My software is resilient to such restarts, but the problem is that it takes a full 20 minutes for the task to be restarted. Has anyone experienced this before?

enter image description here

speedplane
  • 15,673
  • 16
  • 86
  • 138

1 Answers1

2

I think you're right, an instance grabs the task and then goes down. Taskqueue doesn't realize it and waits for some kind of timeout.

This sounds very similar to an issue i experienced: app engine instance dies instantly, locking up deferred tasks until they hit 10 minute timeout

So to answer your question, I would say yes this does happen. As for what to do, I guess it depends on what it is this task is doing, how often it runs, etc. If the 20 minute lag isnt a big deal I would just live with it, just because fixing it can be a bit of a wild goose chase, but here's what I would try:

  1. When launching tasks, launch duplicates as well with a staggered value for countdown/eta
  2. setup a separate microservice to handle/execute these tasks, hopefully this will make it's execution more predictable, you'll be able to tweak instance-size, & scaling settings to better suit it.
Alex
  • 5,141
  • 12
  • 26