29

I'm reading on Google App Engine groups many users (Fig1, Fig2, Fig3) that can't figure out where the high number of Datastore reads in their billing reports come from.
As you might know, Datastore reads are capped to 50K operations/day, above this budget you have to pay.

50K operations sounds like a lot of resources, but unluckily, it seems that each operation (Query, Entity fetch, Count..), hides several Datastore reads.

Is it possible to know via API or some other approach, how many Datastore reads are hidden behind the common RPC.get , RPC.runquery calls?

Appstats seems useless in this case because it gives just the RPC details and not the hidden reads cost.

Having a simple Model like this:

class Example(db.Model):
    foo = db.StringProperty()    
    bars= db.ListProperty(str)

and 1000 entities in the datastore, I'm interested in the cost of these kind of operations:

items_count =  Example.all(keys_only = True).filter('bars=','spam').count()

items_count = Example.all().count(10000) 

items = Example.all().fetch(10000)

items = Example.all().filter('bars=','spam').filter('bars=','fu').fetch(10000)

items = Example.all().fetch(10000, offset=500)

items = Example.all().filter('foo>=', filtr).filter('foo<', filtr+ u'\ufffd')
Dan McGrath
  • 41,220
  • 11
  • 99
  • 130
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • I think that each entity returned is a read, if you have a ref props then the first time you access it is also a read. Notice that fetch(X) doesn't mean that X entities are returned if the count is too high then it would work in bulks and each bulk would count as data reads(bulk size). I have no idea how count(X) works in respect for data reads, it should count as a single read but its a wishful thinking. – Shay Erlichmen Oct 26 '11 at 05:47

2 Answers2

10

See http://code.google.com/appengine/docs/billing.html#Billable_Resource_Unit_Cost . A query costs you 1 read plus 1 read for each entity returned. "Returned" includes entities skipped by offset or count. So that is 1001 reads for each of these:

Example.all(keys_only = True).filter('bars=','spam').count() 
Example.all().count(1000)
Example.all().fetch(1000)
Example.all().fetch(1000, offset=500)

For these, the number of reads charged is 1 plus the number of entities that match the filters:

Example.all().filter('bars=','spam').filter('bars=','fu').fetch()
Example.all().filter('foo>=', filtr).filter('foo<', filtr+ u'\ufffd').fetch()

Instead of using count you should consider storing the count in the datastore, sharded if you need to update the count more than once a second. http://code.google.com/appengine/articles/sharding_counters.html

Whenever possible you should use cursors instead of an offset.

dlebech
  • 1,817
  • 14
  • 27
ribrdb
  • 381
  • 2
  • 8
  • I can't see any reference in the link you posted that it cost 1 read per entity returned, in fact I don't see any mention of datastore reads. – Shay Erlichmen Nov 14 '11 at 10:07
3

Just to make sure:

I'm almost sure:

Example.all().count(10000)

This one uses small datastore operations (no need to fetch the entities, only keys), so this would count as 1 read + 10,000 (max) small operations.

Barak
  • 2,416
  • 2
  • 15
  • 10
  • You're right - "small operations" are affected with the expression above, not "read operations". Just checked that on a GAE project. – Pavel Vlasov Mar 29 '12 at 03:02