Can someone illustrate how I can store and easily query hierarchical data in google app engine datastore?
3 Answers
The best option depends on your requirements. Here's a few solutions (I'm assuming you're using Python, since you didn't specify):
- If you need to do transactional updates on an entire tree, and you're not going to have more than about 1QPS of sustained updates to any one tree, you can use the built in support for heirarchial storage. When creating an entity, you can pass the "parent" attribute to specify a parent entity or key, and when querying, you can use the .ancestor() method (or 'ANCESTOR IS' in GQL to retrieve all descendants of a given entity.
- If you don't need transactional updates, you can replicate the functionality of entity groups without the contention issues (and transaction safety): Add a db.ListProperty(db.Key) to your model called 'ancestors', and populate it with the list of ancestors of the object you're inserting. Then you can easily retrieve everything that's descended from a given ancestor with MyModel.all().filter('ancestors =', parent_key).
- If you don't need transactions, and you only care about retrieving the direct children of an entity (not all descendants), use the approach outlined above, but instead of a ListProperty just use a ReferenceProperty to the parent entity. This is known as an Adjacency List.
There are other approaches available, but those three should cover the most common cases.

- 100,655
- 16
- 128
- 198
-
Point (2) answers my query! Thanks. – MathOldTimer Jun 19 '09 at 08:18
-
2It seems that (2) and (1) do the same, but that (1) would be much cheaper. It strikes me that a list of keys is quite expensive in terms of storage cost, which would only get worse as the tree got deeper. Also, wouldn't (1) lead to good locality? – Paul Biggar Dec 18 '09 at 15:31
-
3The built in ancestor support uses the same technique as 2 - it stores a list of ancestors internally. The advantage of 2 is that you don't have the transaction rate limitation. Locality is not an issue. – Nick Johnson Dec 21 '09 at 15:43
-
For (3), you must likely want to use a SelfReferenceProperty type since the parent is probably the same type as self. – Ben Bederson Jan 07 '13 at 20:32
Well, you should try to keep your data as linear as possible. If you need to quickly query a tree structure of data, you would either have to store it pickled in the database (or JSON-encoded if you prefer) if that is possible for your data, or you would have to generate tree indices that can be used to quickly query a piece of a tree structure. I'm not sure how Google App Engine would perform when updating those indices, however.
When it comes to Google App Engine, your main concern should be to reduce the number of queries you need to make, and that your queries return as little rows as possible. Operations are expensive, but storage is not, so redundancy should not be seen as a bad thing.
Here are some thoughts on the subject I found by googling (although for MySQL, but you can get the general idea from it): Managing Hierarchical Data in MySQL
Ah and here's a discussion for Google App Engine: Modeling Hierarchical Data

- 49,547
- 13
- 120
- 153
One way is to use the Model's parent attribute. You can then make use of query.ancestor() and model.parent() functions.
I guess it depends on what kind of operations you want to do on this data which would determine how best to represent it.

- 1,067
- 2
- 11
- 24
-
2That's not a good idea. Entity groups should only be used when required for transactions. From documentation: "Only use entity groups when they are needed for transactions. For other relationships between entities, use ReferenceProperty properties and Key values, which can be used in queries." – Blixt Jun 18 '09 at 11:44
-
Also remember: an entity's parent cannot be changed, but a ReferenceProperty can! – Trevor Jan 16 '15 at 22:07