6

Is there any clever way to avoid making a costly query with an IN clause in cases like the following one?

I'm using Google App Engine to build a Facebook application and at some point I (obviously) need to query the datastore to get all the entities that belong to any of the facebook friends of the given user.

Suppose I have a couple of entities modeled as such:

class Thing(db.Model):
    owner = db.ReferenceProperty(reference_class=User, required=True)
    owner_id = db.StringProperty(required=True)
    ...

and

class User(db.Model):
    id = db.StringProperty(required=True)
    ...

At some point I query Facebook to get the list of friends of a given user and I need to perform the following query

# get all Thing instances that belong to friends
query = Thing.all()
query.filter('owner_id IN', friend_ids)

If I did that, AppEngine would perform a subquery for each id in friend_ids, probably exceeding the maximum number of subqueries any query can spawn (30).

Is there any better way to do this (i.e. minimizing the number of queries)? I understand that there are no relations and joins using the datastore but, in particular, I would consider adding new fields to the User or Thing class if it helps in making things easier.

abahgat
  • 13,360
  • 9
  • 35
  • 42

2 Answers2

5

I don't think there's an elegant solution, but you could try this:

On the User model, use Facebook ID as the key name, and store each user's list of things in a ListProperty.

class Thing(db.Model):
  ...

class User(db.Model):
  things = db.ListProperty(db.Key)
  ...

Entity creation would go like this:

user = User.get_or_insert(my_facebook_id)

thing = Thing()
thing.put()

user.things.append(thing.key())
user.put()

Retrieval takes 2 queries:

friends = User.get_by_key_name(friend_ids)
thing_keys = []

for friend in friends:
  thing_keys.extend(friend.things)

things = db.get(thing_keys)
Drew Sears
  • 12,812
  • 1
  • 32
  • 41
  • +1 Another option is to make Things children to User allows for ancestor queries for a specific type of thing to be returned. The use of key_names is critical to this really working. – kevpie Oct 19 '10 at 06:32
  • That's great, I even made Things children to User as suggested by kevpie. I had to deal with a couple more issues, though: a) I don't store a User entity for each friend_id, so I need to filter the None values I get when querying using get_by_key_name; b) I have to filter out things by some other fields as well, but I do that on the entities I fetch after I get them from the DataStore. Is there any better way to do that? – abahgat Oct 19 '10 at 21:41
  • Be sure to watch the talks Nick posted in his answer. You may want to use an index entity combined with a list property. This is shown in the first talk posted by Nick. – kevpie Oct 19 '10 at 22:50
  • I'm trying to develop the alternative using index entities. Performing get_by_key_name with a list of Facebook friends as an argument appears to be stressing the DataStore a bit too much. The problem with index entities associated to each user is when to update them to keep them consistent with Facebook. – abahgat Oct 24 '10 at 22:09
3

This Google I/O talk by Brett Slatkin addresses the exact situation you're dealing with. See also his follow up talk this year.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
  • I'm afraid you posted twice the same link to the last talk. Were you referring to this one? http://www.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html – abahgat Oct 19 '10 at 20:45