3

Good day,

I am running a back-end to an application as an app engine (Java). Using endpoints, I receive requests. The problem is, there is something big I need to compute, but I need fast response times for the front end. So as a solution I want to precompute something, and store it a dedicated the memcache.

The way I did this, is by adding in a static block, and then running a deferred task on the default queue. Is there a better way to have something calculated on startup?

Now, this deferred task performs a large amount of datastore operations. Sometimes, they time out. So I created a system where it retries on a timeout until it succeeds. However, when I start up the app engine, it immediately creates two of the deferred task. It also keeps retrying the tasks when they fail, despite the fact that I set DeferredTaskContext.setDoNotRetry(true);.

Honestly, the deferred tasks feel very finicky.

I just want to run a method that takes >5 minutes (probably longer as the data set grows). I want to run this method on startup, and afterwards on a regular basis. How would you model this? My first thought was a cron job but they are limited in time. I would need a cron job that runs a deferred task, hope they don't pile up somehow or spawn duplicates or start retrying.

Thanks for the help and good day.

Dries

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
Dries De Rydt
  • 788
  • 4
  • 17
  • cron jobs have the same deadline as deferred tasks – Greg Dec 30 '15 at 10:00
  • 1
    How are you going to share/access this data (classifier) across instances. Remember each instance has it's own memory. So at the moment I would imagine you start this build each time an instance starts. Once processed can it be serialised to <1MB if not then you will then need to split the serialised entity and store it in chunks (in the datastore). A get from the datatstore is not much slower than memcache and is reliable. – Tim Hoffman Dec 30 '15 at 11:11
  • That is my main issue, it cannot easily be serialized in a single chunk in the datastore. so either way I will have to do some kind of batch operation. – Dries De Rydt Dec 30 '15 at 11:31
  • To avoid strict deadlines on "batch-y" operations (typically cron tasks, but queued tasks too), consider dispatching them to a module using manual scaling (or basic scaling), see https://cloud.google.com/appengine/docs/java/modules/ . – Alex Martelli Dec 30 '15 at 18:19

1 Answers1

1
  1. Your datastore operations should never time out. You need to fix this - most likely, by using cursors and setting the right batch size for your large queries.

  2. You can perform initialization of objects on instance startup - check if an object is available, if not - do the calculations.

  3. Remember to store the results of your calculations in the datastore (in addition to Memcache) as Memcache is volatile. This way you don't have to recalculate everything a few seconds after the first calculation was completed if a Memcache object was dropped for any reason.

  4. Deferred tasks can be scheduled to perform after a specified delay. So instead of using a cron job, you can create a task to be executed after 1 hour (for example). This task, when it completes its own calculations, can create another task to be excited after an hour, and so on.

Community
  • 1
  • 1
Andrei Volgin
  • 40,755
  • 6
  • 49
  • 58
  • 1
    Thanks for the reply, The problem is that I need to build a model of a classifier. This is not so trivial to simply store. Even if I do store the model, it is still a big query to retrieve it all, which results in problems (timeouts mostly). I would prefer to calculate it once, keep the thing in memory so I can quickly classify instances that come in from my front end – Dries De Rydt Dec 30 '15 at 10:06
  • If you can store it in Memcache, you can store it in the Datastore. And no query should ever time out, even if you retrieve a billion entities, as long as you use cursors. – Andrei Volgin Dec 30 '15 at 10:08