13

Following up on my earlier question regarding GAE Datastore entity hierarchies, I'm still confused about when to use entity groups.

Take this simple example:

  • Every Company has one or more Employee entities
  • An Employee cannot be moved to another Company, and users that deal with one Company can never see the Employees of another Company

This looks like a case where I could make Employee a child entity of Company, but what are the practical consequences? Does this improve scalability, hurt scalability, or have no impact? What are other advantages/disadvantages of using or not using an entity hierarchy?

(Entity groups enable transactions, but assume for this example that I do not need transactions).

Community
  • 1
  • 1
Tony the Pony
  • 40,327
  • 71
  • 187
  • 281

2 Answers2

12

If you don't need transactions, don't use entity groups. They slow things down in some cases, and never speed anything up. Their only benefit is that they enable transactions.

As far as I can tell, the best place to use entity groups is on data that isn't likely to be accessed by many users at the same time, and that you'll frequently want to include in a transaction. So, if you stored the contents of a shopping cart, which probably only the owner of that cart will deal with frequently, those contents might be good for an entity group - it'll be nice to be able to use a transaction for that data when you're adding or updating an entity, and you're not locking anyone else out of anything when you do so.

Riley Lark
  • 20,660
  • 15
  • 80
  • 128
  • 10
    the spirit here is definitely right, and i can revise the technical details a bit. entity groups hurt scalability because writes are serialized per entity group, not because their data is stored close together. (spatial locality is actually often good for caching and scaling, depending on the implementation details.) given that, don't worry too much about the volume of data per entity group. the main thing to worry about is the write throughput. as noted in many other places, you can't do more than 1 to 10 writes per second per entity group. – ryan Jan 31 '11 at 19:57
  • 4
    Another major benefit is strongly consistent reads. Maybe "they enable transactions" implies this, but this wasn't immediately clear to me. – Chris Jan 26 '14 at 18:39
  • 1
    @Chris Yup, in itself, an EG enables strongly consistent reads with the use of ancestor queries. Transactions are a related concept but are different; transactions are meant for guaranteeing atomicity for multiple operations (e.g. a read plus a write). By the way, there is one more way to achieve strongly consistent reads for an entity, which is to perform a lookup by key. This requires you to know what the key is in the first place without "asking" the datastore. https://cloud.google.com/datastore/docs/articles/balancing-strong-and-eventual-consistency-with-google-cloud-datastore/ – Kevin Lee May 06 '17 at 09:24
  • 1
    @riley-lark They supposedly speed up queries, but the downside is the write rate limit. "Ancestor queries also rapidly scan an entity group with minimal I/O because the entities in an entity group are stored at physically close places on Cloud Datastore servers." https://cloud.google.com/datastore/docs/best-practices – Kevin Lee May 06 '17 at 09:29
10

Nick stated clearly that you should not make the groups larger than necessary, the Best practices for writing scalable applications has some discussion one why.

Use entity groups when you need transactions. In the example you gave, a ReferenceProperty on employee will achieve a similar result.

Aside from transactions, entity groups can be helpful because key-fetches and queries can be keyed off of a parent entity. However, you might want to consider multitenancy for these types of use-cases.

Ultimately large entity groups might hurt scalability, entities within an entity group are stored in the same tablet. The more stuff you cram into one entity group, the more you reduce the amount of work that can be done in parallel -- it needs done serially instead.

Robert Kluin
  • 8,282
  • 25
  • 21