4

I'm using App Engine with the eventually-consistent High Replication Data Store. I'm also using sharded counters.

When I query for all of the shards and sum them up, can I assume the counts are strongly consistent? That is, will the below code return an accurate sum of my shard counts?

sum = 0
for counter in Counter.all():
    sum += counter.count
Andrew F
  • 595
  • 1
  • 3
  • 16
  • 1
    not exactly an answer, but you know you could just do `my_sum=sum(counter.count for counter in Counter.all())`, and of course calling a variable `sum` is asking for problems. – hochl Apr 20 '12 at 09:17
  • Good point, forgot about that. – Andrew F Apr 20 '12 at 17:23

4 Answers4

4

No. Even fetching by key, you cannot rely on a strongly consistent count (though it will be more up to date than it would otherwise). Batch get operations are not transactional, so one of the shards could be updated while you are fetching them.

Asking for strong consistency here is kind of meaningless, however. First, in a distributed system like App Engine, simultaneity is a fuzzy concept at the best of times - synchronization requires coordination, which creates bottlenecks. Second, even if you could get a transactional sum of the counter values, it'd be out of date the moment you fetched it, since the counters can be updated immediately after you read them anyway.

Nick Johnson
  • 100,655
  • 16
  • 128
  • 198
2

If you want to create strongly consistent sharded counters, you should use keys, not queries.

#for getting
total = 0
shard_keys = []
for i in range(20): #20 shards
    key_name = shard + str(i)
    shard_keys.append(db.Key.from_path('Counter', key_name))
counters = db.get(shard_keys)
for counter in counters:
    if counter:
        total += counter.count

#for incrementing a shard
import random
key_name = 'shard' + str(int(random.random()*20)) #choose a random shard
counter = Counter.get_by_key_name(key_name) #try to retrieve from datastore
if not counter:
    counter = Counter(key_name=key_name) #shard doesn't exist, create one
counter.count += 1
db.put(counter)

Perform the incrementing within a transaction to ensure consistency.

Albert
  • 3,611
  • 3
  • 28
  • 52
0

Queries are eventually consistent in HRD, so you can not be sure that entities you get via query are updated. If query depends on the entity property that is being updated, than query might not even find the entity.

Peter Knego
  • 79,991
  • 11
  • 123
  • 154
0

You can increase the probability that the sharded counter totals current state, but you cannot (as best I know) get that probability to 100%.

stevep
  • 959
  • 5
  • 8