Which of those options is best to span out work on GAE (to be completed within a reuest timeframe)?
- Use of tasks, store the results in memcache, periodically query memcache in the request and hope the tasks complete in time
- Use of urlfetch to get results of tasks, error handling and security will be a pain though.
- Use of backend instances? (seems insane)
- Or a JAVA instance (seems totally insane)
Background: It´s ridiculous to even have to do this. I need to deliver 10k datastore items as a JSON. Apparently the issue is that Python takes a lot of time to process the datastore results (Java seems much faster). This is well covered: 25796142, 11509368 and 21941954
Approach: As there is nothing to optimize on the Software side (can´t re-write GAE), the approach would be to span work out over multiple instances and to aggregate the results.
Querying keys only and getting query cursors for chunks of 2k items performs reasonably well and there tasks could be spun off to get the results in 2k chunks. The question is about how to best aggregate the results.