Google app engine API: Running large tasks

Question

Good day,

I am running a back-end to an application as an app engine (Java). Using endpoints, I receive requests. The problem is, there is something big I need to compute, but I need fast response times for the front end. So as a solution I want to precompute something, and store it a dedicated the memcache.

The way I did this, is by adding in a static block, and then running a deferred task on the default queue. Is there a better way to have something calculated on startup?

Now, this deferred task performs a large amount of datastore operations. Sometimes, they time out. So I created a system where it retries on a timeout until it succeeds. However, when I start up the app engine, it immediately creates two of the deferred task. It also keeps retrying the tasks when they fail, despite the fact that I set DeferredTaskContext.setDoNotRetry(true);.

Honestly, the deferred tasks feel very finicky.

I just want to run a method that takes >5 minutes (probably longer as the data set grows). I want to run this method on startup, and afterwards on a regular basis. How would you model this? My first thought was a cron job but they are limited in time. I would need a cron job that runs a deferred task, hope they don't pile up somehow or spawn duplicates or start retrying.

Thanks for the help and good day.

Dries

How are you going to share/access this data (classifier) across instances. Remember each instance has it's own memory. So at the moment I would imagine you start this build each time an instance starts. Once processed can it be serialised to <1MB if not then you will then need to split the serialised entity and store it in chunks (in the datastore). A get from the datatstore is not much slower than memcache and is reliable. — Tim Hoffman, Dec 30 '15 at 11:11
That is my main issue, it cannot easily be serialized in a single chunk in the datastore. so either way I will have to do some kind of batch operation. — Dries De Rydt, Dec 30 '15 at 11:31
To avoid strict deadlines on "batch-y" operations (typically cron tasks, but queued tasks too), consider dispatching them to a module using manual scaling (or basic scaling), see https://cloud.google.com/appengine/docs/java/modules/ . — Alex Martelli, Dec 30 '15 at 18:19

score 1 · Accepted Answer · edited May 23 '17 at 12:19

1

Your datastore operations should never time out. You need to fix this - most likely, by using cursors and setting the right batch size for your large queries.
You can perform initialization of objects on instance startup - check if an object is available, if not - do the calculations.
Remember to store the results of your calculations in the datastore (in addition to Memcache) as Memcache is volatile. This way you don't have to recalculate everything a few seconds after the first calculation was completed if a Memcache object was dropped for any reason.
Deferred tasks can be scheduled to perform after a specified delay. So instead of using a cron job, you can create a task to be executed after 1 hour (for example). This task, when it completes its own calculations, can create another task to be excited after an hour, and so on.

edited May 23 '17 at 12:19

Community

1
1

answered Dec 30 '15 at 09:57

Andrei Volgin

40,755
6
49
58

1

Thanks for the reply, The problem is that I need to build a model of a classifier. This is not so trivial to simply store. Even if I do store the model, it is still a big query to retrieve it all, which results in problems (timeouts mostly). I would prefer to calculate it once, keep the thing in memory so I can quickly classify instances that come in from my front end – Dries De Rydt Dec 30 '15 at 10:06
If you can store it in Memcache, you can store it in the Datastore. And no query should ever time out, even if you retrieve a billion entities, as long as you use cursors. – Andrei Volgin Dec 30 '15 at 10:08

Google app engine API: Running large tasks

1 Answers1