1

I'm new to GAE, would appreciate your advice on GAE-app data storage approaches.

Simple example:
- there are Author and Document entities
- each Author may be a creator of several Documents
So we have two options:
1) Add all Documents as children to corresponding Author entities (owned relationship)
2) Add a field to each Document which will identify the Author (unowned link or something)

What are pros and cons of every approach?

P.S. I know about groups and strong consistency. What else?
Buy the way, eventual consistency, what is it in reality - minutes, hours, ...?

Thanks

Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
sberezin
  • 3,266
  • 23
  • 28

2 Answers2

1

The general guideline with most NoSQL stores is to structure your data so that it is optimal for your primary use case and denormalise as you need to to satisfy other needs.

If your most common operation is read all documents for an author, then putting documents under an author makes sense. If its fetch by document, then referencing author may be more practical.

How the datastore is priced (in terms of cost of reads vs writes) will help guide you - cheapest usually is also the most effective design. For example, if documents are write heavy and have many indexes, option 1 could be expensive when you want to update a single document.

W.R.T eventual consistency, it usually wont be longer than seconds worst case, however there are no guarantees. You should not rely on it being good enough in a situation where it must be accurate (for example an author editing a document then previewing it before publishing). Remember that a get by id is strongly consistent read, so generally you can mitigate this as needed.

Nick
  • 1,822
  • 10
  • 9
0

Searching for answers I've run through number of acticles and also encountered this and this posts which are helpful.

So I formed my opinion and hope it will help someone:

Entity groups advantages:
+ Intrinsic strong consistency (see also about transactions)
+ Ancestor calls may serve similar to "namespaces in miniature". This may be used to separate data still with possibility to share it.

Entity groups disadvantages due to limits on writes per second (see here in the end):
- may hurt scalability
- may slow concurrent access
- shouldn't be large anyway since access to groups is serialized

So the use of entity groups IMHO is limited to:
- cases where strong consistency is demanded. Still to avoid contention groups should be kept as small as possible
- single user data storage
In all other cases I will avoid them.

Community
  • 1
  • 1
sberezin
  • 3,266
  • 23
  • 28
  • 1
    It is important to note that you get strong consistency without entity groups if you do get-by-key operations. Entity groups are the only way of having strongly consistent queries. – Nick Jul 17 '14 at 01:11