28

I need to get a count of records for a particular model on App Engine. How does one do it?

I bulk uploaded more than 4000 records but modelname.count() only shows me 1000.

Ryan A.
  • 411
  • 4
  • 13

8 Answers8

22

You should use Datastore Statistics:

Query query = new Query("__Stat_Kind__");
query.addFilter("kind_name", FilterOperator.EQUAL, kind);       
Entity entityStat = datastore.prepare(query).asSingleEntity();
Long totalEntities = (Long) entityStat.getProperty("count");

Please note that the above does not work on the development Datastore but it works in production (when published).

I see that this is an old post, but I'm adding an answer in benefit of others searching for the same thing.

Community
  • 1
  • 1
TommyN
  • 2,252
  • 22
  • 19
  • 1
    I am getting this error on the first line of your code: "Cannot instantiate the type Query" Any ideas? – steven Oct 14 '17 at 21:08
  • This only returns some of my entities and even for those the 'count' field is wrong. Is there a way to force the stats to refresh? – N S Jan 21 '19 at 16:58
13

As of release 1.3.6, there is no longer a cap of 1,000 on count queries. Thus you can do the following to get a count beyond 1,000:

count = modelname.all(keys_only=True).count()

This will count all of your entities, which could be rather slow if you have a large number of entities. As a result, you should consider calling count() with some limit specified:

count = modelname.all(keys_only=True).count(some_upper_bound_suitable_for_you)
David Underhill
  • 15,896
  • 7
  • 53
  • 61
dar
  • 6,520
  • 7
  • 33
  • 44
  • 2
    300ms? Where did you get that figure? Regardless, though, this demonstrates why counting objects on the fly is not a good idea. – Nick Johnson May 28 '09 at 14:56
  • 1
    My guess is he meant 30,000ms. But that really isn't the case because if you're doing this because of the bulk uploader, you probably just run the count over the remote_api anyway - which AFAIK is not subject to the 30 second timeout. – dar May 28 '09 at 20:00
  • 3
    I added keys_only=True which is faster – Shay Erlichmen Aug 05 '10 at 09:35
  • 1
    @dar, count in remote_api is still subject to the 30s timeout. remote_api scripts themselves aren't subject to the timeout, but individual API call still are. – ryan Jan 29 '11 at 00:21
  • 1
    @ryan I know, that's why I said it. – dar Jan 29 '11 at 13:50
  • 1
    ah, ok, i guess I misunderstood. so we agree: within remote_api, each individual count call is still subject to a 30s timeout. – ryan Jan 29 '11 at 16:59
  • 1
    also, @Shay, `count()` always does the equivalent of `keys_only=True` under the hood, so you don't need to add it explicitly. – ryan Feb 01 '11 at 05:33
9

This is a very old thread, but just in case it helps other people looking at it, there are 3 ways to accomplish this:

  1. Accessing the Datastore statistics
  2. Keeping a counter in the datastore
  3. Sharding counters

Each one of these methods is explained in this link.

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
svpino
  • 1,864
  • 17
  • 18
  • Statistics are gathered in a daily basis (or even every 48 hours), as said here: https://github.com/GoogleCloudPlatform/google-cloud-node/issues/413 Thus it may happen the statistics count does not match the actual count. Even, it is said that statistics could not be available for large datasets... – frb Jan 09 '18 at 14:09
8
count = modelname.all(keys_only=True).count(some_upper_limit)

Just to add on to the earlier post by dar, this 'some_upper_limit' has to be specified. If not, the default count will still be a maximum of 1000.

tkc
  • 101
  • 1
  • 1
  • 1
    Good answer. I looked a log time for this and it seems also reasonable to cache the result with memcache if updates are not very frequent. – Niklas Rosencrantz May 18 '11 at 14:03
6

In GAE a count will always make you page through the results when you have more than 1000 objects. The easiest way to deal with this problem is to add a counter property to your model or to a different counters table and update it every time you create a new object.

Federico Builes
  • 4,939
  • 4
  • 34
  • 48
3

I still hit the 1000 limit with count so adapted dar's code (mine's a bit quick and dirty):

class GetCount(webapp.RequestHandler):
    def get(self):
        query = modelname.all(keys_only=True)

        i = 0
        while True:
            result = query.fetch(1000)
            i = i + len(result)
            if len(result) < 1000:
                break
            cursor = query.cursor()
            query.with_cursor(cursor)

        self.response.out.write('<p>Count: '+str(i)+'</p>')
Gavin
  • 5,629
  • 7
  • 44
  • 86
2
DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
Query query = new Query("__Stat_Kind__");
Query.Filter eqf = new Query.FilterPredicate("kind_name",
                                Query.FilterOperator.EQUAL,
                                "SomeEntity");
query.setFilter(eqf);
Entity entityStat = ds.prepare(query).asSingleEntity();
Long totalEntities = (Long) entityStat.getProperty("count");
varun
  • 4,522
  • 33
  • 28
  • Just to clarify, replace "SomeEntity" with the name of the kind you are wanting to get the count for, everything else stays the same. Also, this will only work when running deployed, entityStat will come back as null when running locally. – Craigo Aug 09 '17 at 22:09
2

Another solution is using a key only query and get the size of the iterator. The computing time with this solution will rise linearly with the amount of entrys:

Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
KeyFactorykeyFactory = datastore.newKeyFactory().setKind("MyKind");
Query query = Query.newKeyQueryBuilder().setKind("MyKind").build();
int count = Iterators.size(datastore.run(query));
M. Schena
  • 2,039
  • 1
  • 21
  • 29