28

I would think this would be easy for an SQL-alike! What I want is the GQL equivalent of:

select count(*) from foo;

and to get back an answer something similar to:

1972 records.

And I want to do this in GQL from the "command line" in the web-based DataStore viewer. (You know, the one that shows 20 at a time and lets me see "next 20")

Anyway -- I'm sure it's brain-dead easy, I just can't seem to find the correct syntax. Any help would be appreciated.

Thanks!

Olie
  • 24,597
  • 18
  • 99
  • 131
  • 1
    if you follow my link, there is another thread with lots of discussion about writing the gql to do this. `count(*)` is not valid gql, and there's tons of explanation on that thread. As for running that gql, your local deployment has a [dev console](http://localhost:8080/_ah/admin) with an area called `Interactive Console` that lets you execute code. Start with `from google.appengine.ext import db`... – dfichter Feb 12 '11 at 22:42
  • possible duplicate of [What's the best way to count results in GQL?](http://stackoverflow.com/questions/421751/whats-the-best-way-to-count-results-in-gql) – Donal Fellows Dec 23 '12 at 19:33

5 Answers5

36

With straight Datastore Console, there is no direct way to do it, but I just figured out how to do it indirectly, with the OFFSET keyword.

So, given a table, we'll call foo, with a field called type that we want to check for values named "bar":

SELECT * FROM foo WHERE type="bar" OFFSET 1024

(We'll be doing a quick game of "warmer, colder" here, binary style)

Let's say that query returns nothing. Change OFFSET to 512, then 256, 128, 64, ... you get the idea. Same thing in reverse: Go up to 2048, 4096, 8192, 16384, etc. until you see no records, then back off.

I just did one here at work. Started with 2048, and noticed two records came up. There's 2049 in the table. In a more extreme case, (lets say there's 3300 records), you could start with 2048, notice there's a lot, go to 4096, there's none... Take the midpoint (1024 between 2048 and 4096 is 3072) next and notice you have records... From there you could add half the previous midpoint (512) to get 3584, and there's none. Whittle back down half (256) to get 3328, still none. Once more down half (128) to get 3200 and there's records. Go up half of the last val (64) and there's still records. Go up half again (32) to 3296 - still records, but so small you can easily see there's exactly 3300.

The nice thing about this vs. Datastore statistics to see how many records are in a table is you can limit it by the WHERE clause.

RodneyReid
  • 461
  • 1
  • 4
  • 3
  • Love the OFFSET suggestion, very useful! – Tom Clift Apr 15 '16 at 00:49
  • The offset suggestion and a binary search was a great idea! No more exporting to big query every time management needs some random statistic... :) – Hypo Jan 25 '18 at 09:56
  • 5
    Something to keep in mind about `OFFSET` when dealing with large collections is that (as long as you don’t use cursors) it only discards entities from the result set, so it’s not a “real” offset. That means 1) you are charged for *all* read operations including the offset and 2) the response time will slow down linearly. E.g., a query with `offset 1000000` will be quite slow and you pay all 1000000 read operations. – jotaen Jun 13 '18 at 15:24
  • That really helped. Additionally, you can keep looking at the bottom pagination. Once you get too close then it starts showing up the nearest value instead of 'of many'. – Vijender Kumar Oct 23 '19 at 07:49
15

I don't think there is any direct way to get the count of entities via GQL. However you can get the count directly from the dashbaord ;

enter image description here

More details - https://cloud.google.com/appengine/docs/python/console/managing-datastore

cs95
  • 379,657
  • 97
  • 704
  • 746
Aniket Thakur
  • 66,731
  • 38
  • 279
  • 289
13

As it's stated in other questions, it looks like there is no count aggregate function in GQL. The GQL Reference also doesn't say there is the ability to do this, though it doesn't explicitly say that it's not possible.

In the development console (running your application locally) it looks like just clicking the "List Entities" button will show you a list of all entities of a certain type, and you can see "Results 1-10 of (some number)" to get a total count in your development environment.

In production you can use the "Datastore Statistics" tab (the link right underneath the Datastore Viewer), choose "Display Statistics for: (your entity type)" and it will show you the total number of entities, however this is not the freshest view of the data (updated "at least once per day").

Since you can't run arbitrary code in production via the browser, I don't think saying "use .count() on a query" would help, but if you're using the Remote API, the .count() method is no longer capped at 1000 entries as of August, 2010, so you should be able to run print MyEntity.all().count() and get the result you want.

Community
  • 1
  • 1
JJ Geewax
  • 10,342
  • 1
  • 37
  • 49
  • `MyEntity.all().count()` won't get you above 1,000. You can force a higher limit as an argument to `count()`, but Google warns you that this may [time you out](http://code.google.com/appengine/docs/python/datastore/gqlqueryclass.html#GqlQuery_count). – dfichter Feb 13 '11 at 23:09
  • 2
    "`count()` has no maximum limit. If you don't specify a limit, the datastore continues counting until it finishes counting or times out." – JJ Geewax Feb 15 '11 at 18:37
  • As the comments in the [answer](http://stackoverflow.com/questions/264154/google-appengine-how-to-fetch-more-than-1000/3543527#3543527) you linked to explain, you need to force a limit higher than 1,000. It's easy to test out in the interactive console. Give it a try. – dfichter Feb 16 '11 at 01:06
  • Didn't know about the 'datastore statistics' view showing entity counts. Thanks! – Cuga Jul 16 '12 at 13:37
0

This is one of those surprising things that the datastore just can't do. I think the fastest way to do it would be to select __KEY__ from foo into a List, and then count the items in the list (which you can't do in the web-based viewer).

If you're happy with statistics that can be a little bit stale, you can go to the Datastore Statistics page of the admin console, which will tell you how many entities of each type there were some time ago. It seems like those stats are usually less than 10 hours old. Unfortunately, you can't query them more specifically.

Riley Lark
  • 20,660
  • 15
  • 80
  • 128
  • fetching and counting keys will never result in a count above 1,000. The thread I linked to discusses that : ) – dfichter Feb 12 '11 at 22:48
  • I'm not sure you're right about that. There may still be some limit to the return count, but I'm sure the 1000-entity cap was removed. Either way this is a slow way of counting, and pre-calculating is a faster way to go. – Riley Lark Feb 14 '11 at 19:45
  • yeah, definitely slow, and you don't want to time out. Sorry I wasn't exact, I meant that `count()` won't return above 1,000. Try out `db.GqlQuery('select __key__ from YourEntity').count()` in the interactive console and you won't get beyond 1,000. You can force `count()` above 1,000 by passing a limit to it, but I agree, that's not a great solution. – dfichter Feb 16 '11 at 01:15
0

There's no way to get a total count in GQL. Here's a way to get a count using python:

def count_models(model_class, max_fetch=1000):
  total = 0
  cursor = None
  while True:
    query = model_class.all(keys_only=True)
    if cursor: 
        query.with_cursor(cursor)
    results = query.fetch(max_fetch)
    total += len(results)
    print('still counting: ' + total)
    if (len(results) < max_fetch):
        return total
    cursor = query.cursor()

You could run this function using the remote_api_shell, or add a custom page to your admin site to run this query. Obviously, if you've got millions of rows you're going to be waiting a while. You might be able to increase max_fetch, I'm not sure what the current fetch limit is.

Calvin
  • 4,177
  • 1
  • 16
  • 17