0

I'm trying to show a table of ~800 entities, and having problems keeping it from being really slow. (Like 15-20 seconds slow.) I successfully implemented memcache, but because I reference a parent model for each of the child entities it still causes a datastore_v3.Get for each of the 800 and is massively slow.

I then implemented Nick Johnson's ReferenceProperty prefetching and can't solve the following error:

[... snipped ...]
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/webapp2/webapp2.py", line 570, in dispatch
  return method(*args, **kwargs)
File "/myurl/mypythoncode.py", line 67, in get
  prefetch_refprops(entitylist, ChildModel.parent_program.name)
File "/myurl/mypythoncode.py", line 36, in prefetch_refprops
  fields = [(entity, prop) for entity in entities for prop in props]
TypeError: 'NoneType' object is not iterable

Models:

These are the two relevant models:

class ParentModel(db.Model):
  name = db.StringProperty()
  # currently 109 of these

class ChildModel(db.Model):
  name = db.StringProperty()
  parent_program = db.ReferenceProperty(ParentModel)
  website = db.StringProperty()
  # currently 758 of these

Python code:

In my Python code I'm using Nick Johnson's techniques of efficient model memcaching and for ReferenceProperty prefetching. (I've included the ReferenceProperty prefetching below, but not the memcaching code.)

class AllEntities(webapp2.RequestHandler):
  def get(self):
    entitylist = deserialize_entities(memcache.get("entitylist"))
    entityref = prefetch_refprops(entitylist, ChildModel.parent_program.name)
    if not entitylist:
      entitylist = ChildModel.all().fetch(None)
      entityref = prefetch_refprops(entitylist, ChildModel.parent_program.name)
      memcache.set("entitylist", serialize_entities(entitylist))
    context = {
      'entitylist': entitylist,
    }
    self.response.out.write(template.render(context))

def prefetch_refprops(entities, *props):
    fields = [(entity, prop) for entity in entities for prop in props]
    ref_keys = [prop.get_value_for_datastore(x) for x, prop in fields]
    ref_entities = dict((x.key(), x) for x in db.get(set(ref_keys)))
    for (entity, prop), ref_key in zip(fields, ref_keys):
        prop.__set__(entity, ref_entities[ref_key])
    return entities

Jinja2 template:

My Jinja2 template references the iterable "entry" in "entitylist" but also the parent_program.name and parent_program.key().id()

{% for entry in entitylist %}
  <tr>
    <td><a href="{{ entry.website}}">{{ entry.website }}</a></td>
    <td><a href="/urlcategory/view?entityid={{ entry.parent_program.key().id() }}">{{ entry.parent_program.name }}</td>
  </tr>
{% endfor %}

I've replaced the line:

entityref = prefetch_refprops(entitylist, ChildModel.parent_program.name)

with

entityref = prefetch_refprops(entitylist, ChildModel.parent_program)

and other variations that include ".name" and ".key().id()". When I use ".key().id()" I get the error:

AttributeError: 'ReferenceProperty' object has no attribute 'key'

What am I missing or screwing up? I'd really appreciate any help!

Jed Christiansen
  • 659
  • 10
  • 21
  • Delete the first `entityref = prefetch_refprops(entitylist, ChildModel.parent_program.name)` line. `entitylist` is set to None the first time through, so the prefetch fails. – mjibson Jul 15 '12 at 02:13
  • Thanks, I've fixed that. But now I'm getting "AttributeError: 'str' object has no attribute 'get_value_for_datastore'" for "prefetch_refprops(companylist, ChildModel.parent_program.name)". This seems odd since it's virtually the exact same as the same that Nick uses in his example. – Jed Christiansen Jul 15 '12 at 09:42
  • Okay, I fixed it now, by changing it to "prefetch_refprops(companylist, ChildModel.parent_program)". I'm not getting any more errors, but it feels like I'm still not doing something right... – Jed Christiansen Jul 15 '12 at 11:03

1 Answers1

1

Jed, you're doing it right :)

Two improvements:

  1. You don't need to assign the return value of the prefetch as it isn't being used and the companylist will be being modified in place.
  2. I use a slightly modified version of the prefetch_refprops to handle cases where the reference property isn't populated.

    def prefetch_refprops(entities, *props):
        fields = [(entity, prop) for entity in entities for prop in props]
        ref_keys_all = [prop.get_value_for_datastore(x) for x, prop in fields]
        ref_keys = [ref_key for ref_key in ref_keys_all if ref_key is not None]
        ref_entities = dict((x.key(), x) for x in db.get(set(ref_keys)))
        for (entity, prop), ref_key in zip(fields, ref_keys_all):
            if ref_key and ref_entities[ref_key]:
                prop.__set__(entity, ref_entities[ref_key])
            else:
                prop.__set__(entity, None)
        return entities
    

We're using this in production code, it makes a real difference. Below is an example of turning on/off the prefetch(es) on a bit of code that is building template values.

(run:   real_time, Get #rpcs, RunQuery #rpcs)
Before:   5044 ms,       132,    101
After:    2214 ms,        53,     11

The other heavy chain-ladder operation our code is doing is a count() on the ref_set for each object, which we will replace in the near future with caching the values on the objects.

Campey
  • 390
  • 2
  • 10
  • Thanks, Campey. I ended up going with a different approach, outlined by Guido here: http://stackoverflow.com/questions/9127982/avoiding-memcache-1m-limit-of-values/9143912#9143912 But it definitely works! – Jed Christiansen Oct 21 '12 at 14:45