TransactionFailedError on GAE when no transaction

Question

I got this error:

TransactionFailedError: too much contention on these datastore entities. please try again.

Even though I'm not doing any transactions. The line of my code that causes the error is

ndb.put_multi(entity_list) # entity_list is a list of 100 entities

This error doesn't happen often so it isn't a big deal, but I'm curious why I get this error. Any ideas?

Here is most of the traceback:

Traceback (most recent call last):
  ...
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 318, in post
    self.run_from_request()
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 313, in run_from_request
    run(self.request.body)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 155, in run
    return func(*args, **kwds)
  File "/base/data/home/apps/s~opavote/2017-09-15.404125237783169549/tasks.py", line 70, in start_election
    models.Voter.create(e.eid, chunk)
  File "/base/data/home/apps/s~opavote/2017-09-15.404125237783169549/models.py", line 2426, in create
    ndb.put_multi(voters + vbs)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3958, in put_multi
    for future in put_multi_async(entities, **ctx_options)]
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 383, in get_result
    self.check_success()
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
    value = gen.throw(exc.__class__, exc, tb)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 824, in put
    key = yield self._put_batcher.add(entity, options)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 427, in _help_tasklet_along
    value = gen.throw(exc.__class__, exc, tb)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 358, in _put_tasklet
    keys = yield self._conn.async_put(options, datastore_entities)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 513, in _on_rpc_completion
    result = rpc.get_result()
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 928, in get_result
    result = rpc.get_result()
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
    return self.__get_result_hook(self)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1893, in __put_hook
    self.check_rpc_success(rpc)
  File "/base/data/home/runtimes/python27_experiment/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1385, in check_rpc_success
    raise _ToDatastoreError(err)
TransactionFailedError: too much contention on these datastore entities. please try again.

score 3 · Accepted Answer · answered Sep 22 '17 at 17:30

Note that the error is actually received from the datastore itself, in the RPC response: self.check_rpc_success(rpc).

Which makes me suspect that on the datastore side, to ensure operation consistency/reliability across the redundant pieces of infra supporting it, every write operation is actually using the same/similar mechanisms as for transactional operations. The difference would be that those also have some transactional checks on the client side, before/after the RPC exchange and maybe explicit RPC transaction start/end triggers for the datastore.

From Life of a Datastore Write, a quote suggesting that some common mechanisms are being used regardless of the operations being transactional or not (emphasis mine):

If the commit phase has succeeded but the apply phase failed, the datastore will roll forward to apply the changes to indexes under two circumstances:

The next time you execute a read or write or start a transaction on this entity group, the datastore will first roll forward and fully apply this committed but unapplied write, based on the data in the log.

And one of the possible reasons for failures would be simply too many parallel accesses to the same entities, even if they're just read-only. See Contention problems in Google App Engine, though in that case they're for transactions on the client side.

Note that this is just a theory ;)

The first page you linked states that there is "an expected failure rate on writes, because Bigtable tablets are sometimes unavailable." I suspect that is what happened here and that the failure was part a transaction in doing that. — new name, Sep 23 '17 at 15:53

score 2 · Answer 2 · answered Sep 22 '17 at 17:40

2

It might be worth re-reviewing transactions and entity groups, noting the various definitions and limits.

Putting "Every attempt to create, update, or delete an entity takes place in the context of a transaction," and, "There is a write throughput limit of about one transaction per second within a single entity group," probably speaks to what you're seeing, particularly if entity_list contains entities that would fall into the same entity group.

answered Sep 22 '17 at 17:40

Dave W. Smith

24,318
4
40
46

The entities being put here are all being created and I'm not using any entity groups. It sounds like a transaction is being used under the hood to create the entities. It seems weird to me that you could have contention while creating an entity. – new name Sep 23 '17 at 15:35
2

Are you supplying your own Key names, or are you letting the IDs be auto-assigned? The former can causing contention problems if the keys are really close together (e.g., sequential). Auto-assigned IDs avoid that. – Dave W. Smith Sep 23 '17 at 17:08
I do have custom key names that are created by concatenating an ID of a different entity with an email address of a user, something like 12345678_joe@example.com. In some instances, the email addresses will all be from the same domain so that the only difference in the keys will be between the "_" and the "@". Would that make the keys close enough together to potentially cause contention problems? – new name Sep 23 '17 at 18:09
1

If all (or many) of the entities share in `entity_group` share that key, my best would be on that being enough to cause contention. – Dave W. Smith Sep 23 '17 at 22:14
1

@JeffO'Neill, if the prefix ID (before the "_") is sequentially incremented it might happen that entities with prefix IDs very close to it might also be stored in the same "tablet", which could create the hotspot you have experienced. Eventually, in an effort of self-optimization, Datastore will split the entity sequences in a tablet and move parts into different tablets later on. But the best approach is to let Datastore prevent hotspots and use auto-assigned IDs (even when you append strings for the keys). BTW, similar hotspots may occur on indexed time-stamps that are very close together. – Ani Sep 24 '17 at 09:39
@Ani, the prefix ID is a constant in this case and is an actual entity ID of another entity. I just realized though that the email addresses will often be alphabetical which likely makes the keys sequential. I suspect I will get better performance if I swap the order and do my keys like this: `joe@example.com_12345678`. Using custom key names is really helpful for my specific application. – new name Sep 24 '17 at 13:22

TransactionFailedError on GAE when no transaction

2 Answers2

Linked