0

I am reading about GAE and its datastore. I came across this question and article. So I wonder if my users can be identified, say, by email, would it be reasonable to use the same parent for all users and email as a key with the goal of resolving conflicts when two different users are trying to use the same email as their identifiers? In theory if number of users becomes large (like, say, 10M), may it cause any issues? From my perspective, gets should be just fine but puts are those that are locked. So if gets significantly dominate puts (which happen really only at the point of creating a new user), I don't see any issues. But....

Key parent = KeyFactory.createKey("parent", "users");
Key user = KeyFactory.createKey(parent, "user", "user@domain.com");

When to use entity groups in GAE's Datastore https://developers.google.com/appengine/articles/scaling/contention

Community
  • 1
  • 1
Schultz9999
  • 8,717
  • 8
  • 48
  • 87

2 Answers2

0

I also faced the unique email issue and here's what I've done:

Setup a "kind" called "Email" and use the user inputted email as string key. This is the only way you can make a field scale-able and unique in datastore. Then setup another kind called "User" and have the Key using auto generated Id:

Email

key: email, UserKey: datastore.Key

User

key: auto_id, Password: string, Name: string

In this setup, the email can be used as login, and user have the option to change their email as well (or have multiple emails), while email remains unique system-wide.)

====================

It's not scale-able if you put every user under the same parent. You will end up with all data stuck on one particular "server" because entities from the same entity group are stored in close proximity. You will end up facing the 5 writes per second issue

=====================

As a general rule of thumb, things that scales (e.g. user), must be a root entity to enjoy the benefit of data-store scale-ability.

user7180
  • 3,756
  • 2
  • 22
  • 26
  • also consider apply lower casing for the email... if not you end up with User@ExamPle.com and user@example.com in different key. (even though RFC said the User part can be case sensitive, no one is actually doing it as it's confusing to have "John" and "john" as two different email account) – user7180 Jul 06 '13 at 03:06
  • Btw, for now I thought I'd partition by email domain. This may not be ideal and has bias but at least it eliminates a need to lock gmail users when processing yahoo or hotmail ones. – Schultz9999 Jul 06 '13 at 21:56
0

I think I have found the answer to my question. https://developers.google.com/appengine/articles/handling_datastore_errors in Causes of Errors section:

The first type of timeout occurs when you attempt to write to a single entity group too quickly. Writes to a single entity group are serialized by the App Engine datastore, and thus there's a limit on how quickly you can update one entity group. In general, this works out to somewhere between 1 and 5 updates per second; a good guideline is that you should consider rearchitecting if you expect an entity group to have to sustain more than one update per second for an extended period. Recall that an entity group is a set of entities with the same ancestor—thus, an entity with no children is its own entity group, and this limitation applies to writes to individual entities, too. For details on how to avoid datastore contention, see Avoiding datastore contention. Timeout errors that occur during a transaction will be raised as a appengine.ext.db.TransactionFailedError instead of a Timeout.

Schultz9999
  • 8,717
  • 8
  • 48
  • 87