1

I've this data model (I made it, so if there's a better way to do it, please let me know). Baically I've Club that can have many Courses. now I want to know all the members and instructors of a Club. members and instructors are stored in the Course model, and Club has a reference to them. See the code..

class Course(ndb.Model):
    ...
    instructor_keys = ndb.KeyProperty(kind="User", repeated=True)
    member_keys = ndb.KeyProperty(kind="User", repeated=True)

    @property
    def instructors(self):
        return ndb.get_multi(self.instructor_keys)

    @property
    def members(self):
        return filter(lambda x: x.is_active, ndb.get_multi(self.member_keys))

    def add_instructor(self, instructor):
        if instructor.key not in self.instructor_keys:
            self.instructor_keys.append(instructor.key)
            self.put()

    def rm_instructor(self, instructor):
        if instructor.key in self.instructor_keys:
            self.instructor_keys.remove(instructor.key)
            self.put()

    def add_member(self, member):
        if member.key not in self.member_keys:
            self.member_keys.append(member.key)
            self.put()

    def rm_member(self, member):
        if member.key in self.member_keys:
            self.member_keys.remove(member.key)
            self.put()

and

class Club(ndb.Model):
    ...
    owners_keys = ndb.KeyProperty(kind="User", repeated=True)
    course_keys = ndb.KeyProperty(kind="Course", repeated=True)


    @property
    def members(self):
        # TODO: is this correct? efficient?
        l_members = []
        for courses in self.courses:
            l_members = l_members + courses.members
        return l_members

    @property
    def trainers(self):
        l_trainers = []
        for courses in self.courses:
            l_trainers = l_trainers + courses.trainers
        return l_trainers

    @property
    def owners(self):
        return ndb.get_multi(self.owners_keys)

    @property
    def courses(self):
        return filter(lambda x: x.is_active, ndb.get_multi(self.course_keys))

    def add_owner(self, owner):
        if owner.key not in self.owners_keys:
            self.owner_keys.append(owner.key)
            self.put()

    def rm_owner(self, owner):
        if owner.key in self.owners_keys:
            self.owner_keys.remove(owner.key)
            self.put()

    def add_course(self, course):
        if course.key not in self.courses_keys:
            self.course_keys.append(course.key)
            self.put()

    def rm_course(self, course):
        if course.key in self.courses_keys:
            self.course_keys.remove(course.key)
            self.put()

    def membership_type(self, user):
        if user.key in self.owners_keys:
            return "OWNER"
        elif user in self.members:
            return "MEMBER"
        elif user in self.trainers:
            return "TRAINER"
        else:
            raise Exception("The user %s is not in the club %s" % (user.id, self.id))

Now, the @property on the Course seems to be ok to me. (am I right?) but the one in the Club seems to be very inefficient. every time i've to iterate all the Courses to compute the members and trainers. Plus these methods are not cached but computed every time. So if it's very costly.

Any suggestion? I was thinking about having instructors and members as a list of key also in Club, and update the club every time I add someone to the Course, but not sure it's correct.

PS: Is there a better way to do also the filter on a ndb.get_multi?

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
EsseTi
  • 4,079
  • 5
  • 36
  • 63

2 Answers2

1

I'd try to normalize your model instead of going for a de-normalized one:

class CourseInscription(ndb.Model):
    member = ndb.KeyProperty(kind='User', required=True)
    course = ndb.KeyProperty(kind='Course', required=True)
    is_active = ndb.BooleanProperty(default=True)

Then, you can just add something like

class Course(ndb.Model):
    # all the stuff already there

    @property
    def members(self):
        return CourseInscription.query(CourseInscription.course == self.key)

In general, I prefer to just return the query and let the caller decide to even call it directly or add some more filtering/sorting instead of doing ndb.get_multi directly.

Another nice touch I usually do is to construct the id for the relational entities using their parents, so I can easily check for existence with a get by id instead of having to query

class CourseInscription(ndb.Model):
    # all the other stuff

    @staticmethod
    def build_id(user_key, course_key):
        return '%s/%s' % (user_key.urlsafe(), course_key.urlsafe())

# somewhere I can create an inscription like

CourseInscription(
    id=CourseInscription.build_id(user_key, course_key),
    user=user_key, course=course_key, is_active=True
).put()

# somewhere else I can check if a User is in a Course by just getting... no queries needed
if ndb.Key(CourseInscription, CourseInscription.build_id(user, course)).get():
    # Then the user is in the course!
else:
    # then it's not
marianosimone
  • 3,366
  • 26
  • 32
  • Thanks. I was not sure about using normalization in NDB and if it was a correct approach. I do agree with you that i should return the query. but `self.member_keys` is a list of keys, what should i return in this case? – EsseTi Nov 05 '14 at 12:28
  • the `def build_id(user_key, course_key):` also sets automatically the id of the relation to that value? or how does it work on the put? – EsseTi Nov 05 '14 at 12:29
  • I fixed the build_id example to show it in use on creation (and in the Inscription class, not in Course) – marianosimone Nov 05 '14 at 15:21
  • Normalizing or not is sometimes a matter of taste, and sometimes a matter of efficiency: If you expect more than a couple of thousands of users in a course (and the not is_active/is_active ratio to be big, so you'll end up filtering a lot of people in your original approach), I think it's the way to go (in the other hand, if most of the users will be active, filtering a couple of them after getting shouldn't be such a big issue). In case you continue with the original approach, you would return just the list of users, not a query (and let the caller filter out in memory) – marianosimone Nov 05 '14 at 15:24
  • 1
    Apart from taste... here's a reply by Guido about using repeated properties and the ~1000 limit: http://stackoverflow.com/questions/15377119/gae-ndb-design-performance-and-use-of-repeated-properties/15418435#15418435 – marianosimone Nov 05 '14 at 16:20
  • is still limit still true? i tried and it worked with 5000+ items. plus, how should i create a one-to-many relation then? – EsseTi Nov 06 '14 at 15:40
  • 1
    I think the limit is not a hard one, but one to take into account when using this kind of property. Remember that what's being indexed is one record per element in the list (**so it quickly explodes if you have more than one repeated property in the same entity**). IMHO, repeated properties should not be used for one-to-many but only for one-to-some (and actual one-to-many relationships should be normalized) or for actual (limited) values, and not relationships – marianosimone Nov 06 '14 at 20:21
  • What happens if user_key.id() == course_key.id() ? can it happen or no? – EsseTi Nov 07 '14 at 10:36
  • Good catch... never happened to me, but it doesn't mean it can't. Safest way to avoid that is to change build_id to use urlsafe() for each key (code updated) – marianosimone Nov 07 '14 at 15:20
  • it can be that the two ids cannot be the same beacuse NDB is a table. so every entity has a diffrent id. is it like this? – EsseTi Nov 07 '14 at 15:31
  • Even if auto-generated ids were not repeated (which I'm not sure is the case), someone could be using customs ids, and then urlsafe() would be the way to ensure uniqueness. – marianosimone Nov 10 '14 at 16:06
  • I'm struggling with the query now. the query to get the memeber returns a list of `CourseInscription` but i need a list of 'User'. how should i do? i created another question to keep thing more clear. http://stackoverflow.com/questions/27837399/ndb-many-to-many-retrieve-list-of-one-of-the-relation – EsseTi Jan 08 '15 at 10:04
  • I found out that if i do build_id as you explained i don't get any integer ID but rather the string, how can i solve this? PS: in api fashion is it good practice to pass ID or KEY (urlsafe) `/courses/123` or `courses/asdfdsfwers` ? – EsseTi Jan 12 '15 at 10:00
  • i changed the build_id to this `return ((n + m) * (n + m + 1) / 2) + n` where `n` and `m` are the id (int/long) of the two. this creates a unique ID for the keys. yet, it grows fast but there should be no overflow since long have no limits. can someone confirm this? (it's true for python, not sure for gae) – EsseTi Jan 12 '15 at 10:36
  • @EsseTi sorry, I should've mentioned that: Whenever you use a custom id instead of letting GAE set it, you are going to end up using a string (as a matter of fact, I doesn't matter if you send an int/long, the ids will be strings) – marianosimone Jan 12 '15 at 15:34
1

It makes perfect sense for the members and instructors to belong to the club in addition to being referenced by the courses. Thinking of the use-case, the member would need to join the club before they joined a course offered by the club (though this may happen at the same time of course). It is also significantly more efficient to have the members/instructors referenced directly (in the club) rather than having to traverse the courses each time. Your trade-off comes in additional code maintenance, but this seems justified in this case.

Regarding the active course filter, you may wish to consider an 'active' course list and an 'inactive' course list for the club: once a course is not longer offered, move it to the 'inactive' list (and remove it from the 'active' list).

ChrisC73
  • 1,833
  • 14
  • 14