1

the following bit of code is run regularly as a cronjob and is turning out to be very computationally expensive! The main problem is in the for loop, and I think this can be made a little more efficient using better filtering, however I'm at a loss as to how I can do that.

free_membership_type = MembershipType.all().filter("membership_class =", "Free").filter("live =", True).get()
all_free_users = UserMembershipType.all().filter("membership_active =", True)
all_free_users = all_free_users.filter("membership_type =", free_membership_type).fetch(limit = 999999)
if all_free_users:
    for free_user in all_free_users:
        activation_status = ActivationStatus.all().filter("user = ", free_user.user).get()
        if activation_status and activation_status.activated:
            documents_left = WeeklyLimits.all().filter("user = ", free_user.user).get()
            if documents_left > 0:
                do something...

The models which the code uses are:

class MembershipType(db.Model):
    membership_class = db.StringProperty()
    membership_code = db.StringProperty()
    live = db.BooleanProperty(default = False)

class UserMembershipType(db.Model):
    user = db.ReferenceProperty(UserModel)
    membership_type = db.ReferenceProperty(MembershipType)
    membership_active = db.BooleanProperty(default = False)

class ActivationStatus(db.Model):
    user = db.ReferenceProperty(UserModel) 
    activated = db.BooleanProperty(default = False)

class WeeklyLimits(db.Model):
    user = db.ReferenceProperty(UserModel) 
    membership_type = db.ReferenceProperty(MembershipType) 
    documents_left = db.IntegerProperty(default = 0)

The code I'm using in production does make better use of caching for the various entities, however the for loop still has to cycle through a bunch of users to finally find the few that it needs to do the operation on. Ideally I'd filter out all of the users that don't fulfil the criteria and only then start looping through the list of users - is there some kind of magic bullet that I can use here to achieve this?

user714852
  • 2,054
  • 4
  • 30
  • 52
  • 1
    In an unrelated note; though you "limit=99999" fetch() will only return 1000 entities at a time, but if you use "for user in all_free_users:" without calling fetch() you can iterate through all of them. http://stackoverflow.com/questions/264154/google-appengine-how-to-fetch-more-than-1000 – Jesse Hattabaugh Feb 16 '12 at 19:17
  • Thanks a cool tip arkanciscan, thank you - I'll give it a bash. – user714852 Feb 17 '12 at 23:39

3 Answers3

2

You could improve this by storing the data closer together in a single model. For example, a single entity kind of UserMembership could have all of the fields you need, and you could do a single query:

.filter("membership_type =", "FREE").filter("status =", "ACTIVE").filter("documentsLeft >", 0)

This would require an extra index to be defined, but will run much, much faster.

Riley Lark
  • 20,660
  • 15
  • 80
  • 128
  • I have simplified the models a little for the sake of making it all a little easier to parse. I do need the models to be separate, as it helps make other functions a little more efficient (and easier to understand). It's just this one occurrence that is terribly inefficient. – user714852 Feb 17 '12 at 23:31
  • There's no way to cross-reference or otherwise join queries over different kinds. If you want this cross-referencing in AppEngine, you *must* store the relevant data in a single entity. If you also want the data to be in separate models for other reasons, you'll have to duplicate the data - store it once in its normalized, grouped-together form, and once in its separate form - and use one set for this and the other for that. – Riley Lark Feb 19 '12 at 00:47
2

The magic that you are probably looking for is denormalization. It looks to me like these classes can all be meaningfully combined into a single model:

class Membership(db.Model):
    user = db.ReferenceProperty(UserModel)
    membership_class = db.StringProperty()
    membership_code = db.StringProperty()
    live = db.BooleanProperty(default = False)
    membership_active = db.BooleanProperty(default = False)
    activated = db.BooleanProperty(default = False)
    documents_left = db.IntegerProperty(default = 0)

Then, you can use one query to do all of your filtering.

Over-normalization is a common anti-pattern in AppEngine development. The models that you posted look like they might as well be table definitions for a relational database (although, it's arguable whether its more compartmentalized than needed even for that scenario) and AppEngine's datastore is very much not a relational database.

Can you see any downside to storing all of those fields in a single model?

Adam Crossland
  • 14,198
  • 3
  • 44
  • 54
  • Thanks for the response. The models I've posted have been greatly simplified to improve readability. In actual fact there are quite a few more fields for each entity - it is therefore advantageous (in my uneducated option) to have them as separate entities. – user714852 Feb 17 '12 at 23:38
  • Don't let the sheer number of fields be a compelling reason to separate them into different entities. If they are all related to the same fundamental unit of data then you are just complicating things for yourself. – Adam Crossland Feb 18 '12 at 13:49
1

If you want to avoid denormalizing your data as suggested in the other two answers, you could also consider using Google's new SQL service instead of the normal datastore: http://googleappengine.blogspot.com/2011/10/google-cloud-sql-your-database-in-cloud.html

With SQL you could do all of this in a single query, even with separate entities.

Riley Lark
  • 20,660
  • 15
  • 80
  • 128
  • I'm looking forward to using the SQL service when it comes out of preview - I'm just a little reluctant to patch too many "beta"/ unreleased products to my app. – user714852 Feb 20 '12 at 07:51