What would be the purpose of putting all datastore entities in a single group?

Question

I have started working on an existing project which uses Google Datastore where for some of the entity kinds every entity is assigned the same ancestor. Example:

class BaseModel(ndb.Model):
    @classmethod
    def create(cls, **kwargs):
        return cls(parent=cls.make_key(), **kwargs)
    @classmethod
    def make_key(cls):
        return ndb.Key('Group', cls.key_name())

class Vehicle(BaseModel):
    @classmethod
    def key_name(cls):
        return 'vehicle_group'

So the keys end up looking like this:

Key(Group, 'vehicle_group', Vehicle, 5068993417183232)

There is no such kind as 'Group' nor entity 'vehicle_group' but that's OK in these docs: "note that unlike in a file system, the parent entity need not actually exist".

I understand from reading that this might have a performance benefit in that all the entities of a kind are colocated in the distributed datastore.

But putting all these entities in a single group would in my mind create problems as this project scales, and the once per second write limit would apply to the entire kind. There doesn't appear to be any transactional reason for the group.

No one on the project knows why it was originally done like this. My questions are:

Does anyone know where this "xxx_group" single entity scheme comes from?
And is it as bunk as it appears to be?

score 3 · Accepted Answer · answered Apr 08 '16 at 17:26

Grouping many entities inside a single entity group offers at least 2 advantages I can think of:

ability to perform (ancestor) queries inside transactions - non-ancestor (or cross-group) queries are not allowed inside transactions
ability to access many entities inside the same transaction - cross-group transactions are limited to max 25 entity groups

The 1 write/second/group limit might not be a scalability issue at all for some applications (think write once read a lot kind of apps, for example, or apps for which 1 write per sec is more than enough).

As for the mechanics, the (unique) parent "entity" key for the group is the ndb.Key('Group', "xxx_group") key (which has the "xxx_group" key ID). The corresponding "entity" or its model doesn't need to exist (unless the entity itself needs to be created, bu that doesn't appear to be the case). The parent key is used simply to establish the group's "namespace" in the datastore, if you want.

You can see a somehow similar use in the examples from the Entity Keys documentation, check out the Message use (except Message is just a "parent" entity in the ancestor path, but not the root entity):

class Revision(ndb.Model): message_text = ndb.StringProperty()
ndb.Key('Account', 'sandy@foo.com', 'Message', 123, 'Revision', '1')
ndb.Key('Account', 'sandy@foo.com', 'Message', 123, 'Revision', '2')
ndb.Key('Account', 'larry@foo.com', 'Message', 456, 'Revision', '1')
ndb.Key('Account', 'larry@foo.com', 'Message', 789, 'Revision', '2')
...

Notice that Message is not a model class. This is because we are using Message purely as a way to group Revisions, not to store data.

score 2 · Answer 2 · answered Apr 09 '16 at 23:00

This was probably done to achieve strongly consistent queries within the group. As you've pointed out this design has... drawbacks.

If this is solely reference data (i.e. Read many write once) that may mitigate some of the negatives, but also mostly invalidates the positives (i.e. Eventual consistency is not a problem if data doesn't update often).

What would be the purpose of putting all datastore entities in a single group?

2 Answers2

Linked