15

My question is about modelling one-to-many relations in ndb. I understand that this can be done in (at least) two different ways: with a repeated property or with a 'foreign key'. I have created a small example below. Basically we have an Article which can have an arbitrary number of Tags. Let's assume that a Tag can be removed but cannot be changed after it has been added. Let's also assume that we don't worry about transactional safety.

My question is: what is the preferred way of modelling these relationships?

My considerations:

  • Approach (A) requires two writes for every tag that is added to an article (one for the Article and one for the Tag) whereas approach (B) only requires one write (just the Tag).
  • Approach (A) leverages ndb's caching mechanism when fetching all Tags for an Article whereas in case of approach (B) a query is required (and additionally some custom caching)

Are there some things that I'm missing here, any other considerations that should be taken into account?

Thanks very much for your help.

Example (A):

class Article(ndb.Model):
    title = ndb.StringProperty()
    # some more properties
    tags = ndb.KeyProperty(kind="Tag", repeated=True)

    def create_tag(self):
        # requires two writes
        tag = Tag(name="my_tag")
        tag.put()
        self.tags.append(tag)
        self.put()

    def get_tags(self):
        return ndb.get_multi(self.tags)

class Tag(ndb.Model):
    name = ndb.StringProperty()
    user = ndb.KeyProperty(Kind="User") #  User that created the tag
    # some more properties

Example(B):

class Article(ndb.Model):
    title = ndb.StringProperty()
    # some more properties

    def create_tag(self):
        # requires one write
        tag = Tag(name="my_tag", article=self.key)
        tag.put()

    def get_tags(self):
        # obviously we could cache this query in memcache
        return Tag.gql("WHERE article :1", self.key)

class Tag(ndb.Model):
    name = ndb.StringProperty()
    article = ndb.KeyProperty(kind="Article")
    user = ndb.KeyProperty(Kind="User") #  User that created the tag
    # some more properties
Paolo Moretti
  • 54,162
  • 23
  • 101
  • 92
Emiel vl
  • 186
  • 2
  • 7
  • 2
    Consider checking performance with appstats, as while your specific question here might have a specific answer it probably more relates to your actual usage and so appstats can tell you which of the above options are more efficient in real life. https://developers.google.com/appengine/docs/python/tools/appstats – Paul Collingwood Dec 18 '12 at 10:23
  • 2
    would you create new tags for each article even if its the same tag? i would go for option `A` because you will be able to use the same `Tag` for each article and you will be able to query `Articles` by tag. – aschmid00 Dec 18 '12 at 14:37
  • @PaulC thanks. Indeed I checked with appstats and in my case option B is more efficient (1 write vs 2). However since the optimization is only small I'm unsure if it would be worth giving up on the documented way (ie. option A) to solve a one-to-many relation. – Emiel vl Dec 18 '12 at 16:43
  • @aschmid00 Yes I would create a new `Tag` for each `Article`. This is unclear from the question and I will change it as such. Does that change your answer? Thanks. – Emiel vl Dec 18 '12 at 16:44
  • would still go for A but at this point it depends on how many Tags each Article will have. why would you create separate tags for each article even if they have the same name? also look at @kasavbere's answer... – aschmid00 Dec 18 '12 at 17:29

3 Answers3

6

Have you looked at the following about using Structured Properties https://developers.google.com/appengine/docs/python/ndb/properties#structured . The short discussion there about Contact and Addresse may simplify your problem. Also look at https://developers.google.com/appengine/docs/python/ndb/queries#filtering_structured_properties. The discussions are very short.

Also, looking ahead to the fact that joins are not allowed, option A looks better.

kasavbere
  • 5,873
  • 14
  • 49
  • 72
  • Thanks for your answer, structured properties could do the job but in my specific case I don't think they would be the best solution. What do you mean by "looking ahead to the fact that joins are not allowed"? Is that a GAE policy? – Emiel vl Dec 20 '12 at 10:42
  • 1
    Yes, that is a limitation of the datastore. see https://developers.google.com/appengine/docs/python/datastore/queries. "in particular, joins and aggregate queries aren't supported within the Datastore query engine." The datastore has other restrictions that you should probably be familiar with: https://developers.google.com/appengine/docs/python/datastore/queries#Restrictions_on_Queries – kasavbere Dec 20 '12 at 17:12
1

As stated before, there are no joins in Datastore, so all the "Foreign Key" notion doesn't apply. What can be done is to use the Query class to query your datastore for the correct Tag.

For example, if you are using Endpoints, then:

class Tag(ndb.model):
    user = ndb.UserProperty()

And the during the request do:

query.filter(Tag.user == endpoints.get_current_user())
sagym
  • 61
  • 5
1

Approach (A) should be preferred in most situations. While there are two writes required to add a tag, this is probably much less frequent than reading the tags. As long as you don't have a huge number of tags, they should all fit into the repeated Key property.

As you mentioned, fetching the tags by their keys is much faster than performing a query. Also, if you only need the tag's name and the user, you could create the tag with the User as the parent key and the Name as the tag's id:

User -> Name -> Tag

To create this tag, you would use:

tag = Tag(id=name, parent=user, ...)
article.tags.push(tag)
ndb.put_multi([tag, article])

Then when you retrieve the tags,

for tag in article.tags:
    user = tag.parent()
    name = tag.id()

Then, each key you stored in Article.tags would contain the User key and the Tag name! This would save you from reading in the Tag to get those values.

Brent Washburne
  • 12,904
  • 4
  • 60
  • 82