3

I have three entities: user, post and comment. A user may have multiple posts and a post may have multiple comments.

I know I can add ancestor relations like this:

user(Grand Parent) post(parent) comment(child)

I'm little bit confused about ancestors. I read from documention and searches that ancestors are used for transactions, every ancestors are in same entity group and entity groups are stored in same datastore node which makes it less scaleable. Is this right?

Is creating user as parent of posts and post as parent of comments a good thing?

Rather than this we can add one extra property in the post entity like user_id as shown in example and filter by it.

Which is better/more scalable: filter posts by ancestors or add an extra property user_id in the post Entity and filter by it?

I know both approaches can get the same results but I want to know which one is better in performance and scalability?

Sorry, I'm new in datastore.

Update 11/4/2017

A large number of users is using this App. It's is quite possible there are more than one posts per sec. But A single user can not create posts more than one per sec. But multiple user may be. As described in documentations maximum entity group write rate of 1/s. Is it still possible to use Ancestor ?

Same for comments. Multiple user can add comment in a same entity group. It's is quite possible more than one comment in one sec.

Ancestor Queries are faster ?

I read in many places that ancestors queries are much faster than others.
As I know the reason why they are fast is that because it create entity group and store related data in same node. So, it require less time to get data from single node as compare to multiple nodes.

For Example: If post is store in Asia node and comment is store in Europe node and I want to get posts and comments then datastore API need to fetch two nodes to complete request. Which make it slow. Rather than if I create ancestor relation and make entity group which create a better performance.

But what if I don't need to get post and comment data at same time. If I need post in separate web page and comment in separate page.In this scenario datastore api need to fetch only one node at a time.It is not matter data save in single node or save in multiple node. What about query performance can ancestor make it fast in this case ?

Azeem Haider
  • 1,443
  • 4
  • 23
  • 41
  • In general it is a good idea to ask subsequent/related questions in separate posts (eventually linking/referencing the original question as context) as they tend to diverge, complicate things and harm overall readability. I'm referring here to the `Ancestor Queries are faster ?` question, quite different from the original one. – Dan Cornilescu Nov 04 '17 at 17:37

2 Answers2

3

Yes, you are correct: all ancestry-related entities are in the same entity group, which raises 2 scalability issues: data contention and maximum entity group write rate of 1/s. See somehow related Is there an Entity Group Max Size?

There are advantages of using ancestries and some may be willing to sacrifice scalability for them (see What would be the purpose of putting all datastore entities in a single group?), but IMHO not for your kind of app: I think you'll agree that it's not really critical to see every new user/post/comment in random searches immediately after it is created (i.e. strong consistency) - the fact that it eventually appears is IMHO good enough.

Simply having no ancestry at all and adding additional model properties (entity keys or even just entity key IDs for entities which never have ancestors) to allow cross-referencing entities is the more scalable approach and IMHO fits well with your app.

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • What's **IMHO** ? And can you please explain [Documentation](https://cloud.google.com/appengine/articles/scaling/contention#keep-entity-groups-small) **Note that entity groups are not required if you simply plan to reference one entity from another.** – Azeem Haider Nov 04 '17 at 04:57
  • As I know Ancestors used **Strong consistency** which is not good for **comment** system because it require more time then **Eventually consistency**. Any suggestion ? – Azeem Haider Nov 04 '17 at 05:04
  • I don't think **ancestor** queries are always fast, Please check **update** in question – Azeem Haider Nov 04 '17 at 06:14
  • Ancestor queries are not inherently faster, they are more consistent. – Tim Hoffman Nov 04 '17 at 10:49
  • IMHO - in my humble opinion. I think that by entity groups the note actually refers to the ancestries - there is nothing specifically referencing the groups in the API. – Dan Cornilescu Nov 04 '17 at 16:25
  • Sorry if my answer suggests that some queries are faster than others - what I meant is only ancestor queries being strongly consistent vs non-ancestor queries being eventually consistent (i.e. the results reflecting very recent changes or not). – Dan Cornilescu Nov 04 '17 at 16:33
  • I agree. I don't think that this use-case is worth to sacrifice the scalability and performance for strong consistency and transactions. Also let Datastore automatically create the IDs for best performance and scalability. Plus: reading an entity by key is always strongly consistent, even outside a transaction (e.g. you want to read the user before accepting a new create-post request by them). – Ani Nov 04 '17 at 18:47
  • But what about if there is more than one comments or posts in one sec ?? And it is possible – Azeem Haider Nov 06 '17 at 01:41
  • Without ancestry posts and comments are not in the same entity group so you don't have a write limit - you can handle any number of posts or comments per second. That's why NOT having the ancestry is a more performant and scalable solution. – Dan Cornilescu Nov 06 '17 at 02:45
0

I think the question to ask is: Are you expecting:

  • User to create Posts more than once per seconds (I doubt :)
  • People to comment on a Post more than once per second (could happen)

It not, then having ancestors queries will be faster than normal queries. So it depends of your usecase. I'd go for query speed unless you know you will have thousands of comments on posts.

Sébastien Loix
  • 646
  • 5
  • 9
  • It is possible more than one posts and comments per sec. If yes Can I still use **Ancestor Relation** ? See update in Question – Azeem Haider Nov 04 '17 at 04:59
  • I doubt that a User will be able to create a Post in 1 second :) For the comments, you can put them under User, as @Tim Hoffman suggests in the other question. (https://stackoverflow.com/questions/47041092/migrate-from-cloud-sql-to-datastore/47062865?noredirect=1#comment81165535_47062865) – Sébastien Loix Nov 04 '17 at 18:07
  • But in most case I need to get all comments of particular post it's best to make post as parent not a user. If in this case there is more than one comments in one sec what will happen – Azeem Haider Nov 06 '17 at 01:44