Get record count in Azure DocumentDb

Question

It seems like 'select count(*) from c' in the SQL queries allowed by documentdb in the azure site and through the documentdb explorer (https://studiodocumentdb.codeplex.com/) is not supported. To date, the only way to get a record count that I have found is from code (see below). However, there are enough files in our collection now that this is crashing. Is there a way to get a count on how many documents in a collection that works more than my solution?

DocumentClient dc = GetDocumentDbClient();
var databaseCount = dc.CreateDatabaseQuery().ToList();
Database azureDb = dc.CreateDatabaseQuery().Where(d => d.Id == Constants.WEATHER_UPDATES_DB_NAME).ToArray().FirstOrDefault();

var collectionCount = dc.CreateDocumentCollectionQuery(azureDb.SelfLink).ToList();

DocumentCollection update = dc.CreateDocumentCollectionQuery(azureDb.SelfLink).Where(c => c.Id == "WeatherUpdates").ToArray().FirstOrDefault();

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT * FROM c").ToList();

MessageBox.Show("Databases: " + databaseCount.Count().ToString() + Environment.NewLine
                +"Collections: " + collectionCount.Count().ToString() + Environment.NewLine
                + "Documents: " + documentCount.Count().ToString() + Environment.NewLine, 
                 "Totals", MessageBoxButtons.OKCancel);

score 130 · Answer 1 · answered Apr 22 '17 at 15:26

130

This is now possible in the year 2017 of our lord.

SELECT VALUE COUNT(1) FROM c

[ 1234 ]

answered Apr 22 '17 at 15:26

micah

7,596
10
49
90

34

Be aware that in the month 7 of said year, this eats potentially thousands of RUs. It's rather slow, too. – Philipp Sumi Jul 05 '17 at 09:23
4

Note: 16,400 records in single partition consumes 535 RUs. – Youngjae Jul 24 '17 at 04:57
16

pretty absurd how many RUs it takes to scan something that should be an index. – JJS Aug 01 '17 at 18:21
I see one response about appending a counter to the latest document, but has anyone else found a way to get an aggregate count without depleting throughput usage? – kevinl Aug 25 '17 at 21:44
6

In month 9 of said year, I'm still relying on my own logic in a DAL to count create/delete events because the aggregates all appear to be scan driven vs. index driven. – Steve Cadwallader Sep 30 '17 at 13:28
2

Could you please explain why do I need to add VALUE to my query? I am used to writing "select count(*) from c" and "select count(1) from c" is very similar to that. – Parth Shah Jan 11 '18 at 18:58
3

@ParthShah - from this (https://azure.microsoft.com/en-us/blog/planet-scale-aggregates-with-azure-documentdb/) page: "(If you’re wondering about the VALUE keyword – all queries return JSON fragments back. By using VALUE, you can get the scalar value of count e.g., 100, instead of the JSON document {"$1": 100})" – Chris B. Behrens Jan 19 '18 at 20:59
1

"REQUEST CHARGE 0 RUs" according to azure portal (as of now) and very quick for me – bytedev Feb 22 '18 at 15:58
@nashwan on empty collection? – Konstantin Salavatov Apr 12 '18 at 11:26
@KonstantinSalavatov nope. Just double checked on a large(ish) collection (947497 records) and it still says "0 RUs". – bytedev Apr 12 '18 at 13:36
@nashwan on emulator? Here it takes some RUs even on small collection – Konstantin Salavatov Apr 15 '18 at 14:07
1

Works with me on a 250 gb database. Costing 0 ru/s – Simon Zeinstra Apr 20 '18 at 07:41
@SimonZeinstra is your collection a single partition or unlimited multi-partition? I am seeing a couple thousand RUs charges for a multi-partition collection with a couple of hundred thousand documents. I am wondering if you're seeing zero due to being a single partition. – Rob Reagan Nov 30 '18 at 20:58
1

@RobReagan, it was a partitioned collection. I'm not on this project anymore so cannot check again for you. – Simon Zeinstra Dec 03 '18 at 14:31
`SELECT VALUE COUNT(1) FROM c` is still charging me. My collection is partitioned. – sirdank Dec 14 '18 at 15:27
2

So azure's portal will lie and say 0 RU's. F12 and look at the calls, seeing the exact same thing in c# client. It makes one call which fails (400) then a second that fails, then it modifies the query to SELECT VALUE [{"item": COUNT(1)}] FROM c which causes a row scan. But the best part is their UI shows the RU count from the first failed query which is 0. – ghostbust555 Jul 01 '19 at 21:17
SELECT VALUE COUNT(1) FROM c costs 2.89 RUs on a 248 record db. Vs 16.51 RUs for SELECT * FROM c. So it does seem like the RU cost has been sorted. – James Oct 27 '20 at 11:38

score 30 · Answer 2 · edited Nov 14 '19 at 06:20

30

Actually works at this point:

SELECT COUNT(c.id) FROM c

edited Nov 14 '19 at 06:20

jophab

5,356
14
41
60

answered Feb 23 '17 at 18:11

Mike Katchourine

309
3
2

5

This gives unreliable value. – Youngjae Jul 19 '17 at 02:57
2

This did not work for me. You need to add VALUE in front of COUNT(c.id), like how Micah Williamson said, for this query to work. – Parth Shah Jan 11 '18 at 19:00
This doesnt appear to work when using multiple paritions – bytedev Feb 22 '18 at 16:00
It works by adding count values returned in continuation. Read the answer here - https://stackoverflow.com/questions/47875869/why-does-select-count1-from-c-change-values-each-time-i-query-it-in-cosmosdb-d/50632568#50632568 – Mayur Dhingra May 31 '18 at 20:57

score 14 · Answer 3 · answered Mar 09 '18 at 19:49

14

This is possible in the same way you write SQL query now,

SELECT VALUE COUNT(1) FROM myCollection

NOTE: COUNT(1) won't work for a huge datasets.

You can read more about supported queries from here

answered Mar 09 '18 at 19:49

Sajeetharan

216,225
63
350
396

1

Hi, please tell us how to at least get a rough estimate with huge datasets, without consuming large numbers of RUs: I'm talking tens or hundreds of millions of docs across potentially hundreds of thousands of partitions. Is there a way to do this? – pcdev Sep 11 '19 at 08:25

score 11 · Accepted Answer · edited Jul 19 '17 at 02:42

11

Until the implementation of the "count" keyword, you should do your query in a store procedure on the server. Take care to not get all columns/properties in your query if you want only a count.

Select only the id like;

  dc.CreateDocumentQuery(update.SelfLink, "SELECT c.id FROM c")

edited Jul 19 '17 at 02:42

Youngjae

24,352
18
113
198

answered Jan 05 '15 at 07:05

Papa Ours

126
2

3

Please note that `Count` is implemented now, as in the answers below – Jonesopolis Feb 08 '21 at 14:26

zmische · Answer 5 · 2017-03-21T08:37:22.373

Just to recap - here is example of Count Stored Procedure via JS with continuation support.

And here is one more tool for DocumentDb that's pretty neat: https://github.com/mingaliu/DocumentDBStudio/releases

Upd Mar 2017: In the latest DDB SDK see DDB Aggregates press release there is full support for basic aggregates, without GROUP BY though (for now). Here is GIT REpo with examples: https://github.com/arramac/azure-documentdb-dotnet/tree/master/samples/code-samples/Queries

Dogu Arslan · Answer 6 · 2018-12-19T11:15:07.530

I did a test against a partitioned Document db collection with 200K entities in a single partition. The Collection is configured with 10K RU/second.

Client side queries:

"SELECT VALUE COUNT(1) FROM c"

Time elapsed (ms): 2471 milliseconds Total Request Units consumed: 6143.35

Note: This is the fastest and cheapest option. But keep in mind that you would need to handle continuation on the client side and execute next query using the returned continuation token otherwise you may get partial result/count.

"SELECT COUNT(c.id) FROM c"

Time elapsed (ms): 2589 Total RU: 6682.43

Note: This is very close but slightly slower and more expensive.

Server side / Stored Procedure:

If you need a stored proc, there is one provided here: https://github.com/Azure/azure-cosmosdb-js-server/blob/master/samples/stored-procedures/Count.js

But beware it is problematic.. It internally reads all documents in the collection / partition just to calculate the count. As a result it is much slower and a lot more expensive!

Time elapsed (ms): 8584 milliseconds Total RU: 13419.31

I updated the stored procedure provided in above link to improve the performance. Full Updated Count.js below. The updated stored proc performs way faster and cheaper than the original and it is on par with the best performing client side query (#1 above):

Time elapsed (ms): 2534 milliseconds Total RU: 6298.36

function count(filterQuery, continuationToken) {
    var collection = getContext().getCollection();
    var maxResult = 500000; 
    var result = 0;

    var q = 'SELECT \'\' FROM root';
    if (!filterQuery) {
        filterQuery = q;
    }

    tryQuery(continuationToken);

    function tryQuery(nextContinuationToken) {
        var responseOptions = { continuation: nextContinuationToken, pageSize: maxResult };

        if (result >= maxResult || !query(responseOptions)) {
            setBody(nextContinuationToken);
        }
    }

    function query(responseOptions) {
        return (filterQuery && filterQuery.length) ?
            collection.queryDocuments(collection.getSelfLink(), filterQuery, responseOptions, onReadDocuments) :
            collection.readDocuments(collection.getSelfLink(), responseOptions, onReadDocuments);
    }

    function onReadDocuments(err, docFeed, responseOptions) {
        if (err) {
            throw 'Error while reading document: ' + err;
        }

        result += docFeed.length;

        if (responseOptions.continuation) {
            tryQuery(responseOptions.continuation);
        } else {
            setBody(null);
        }
    }

    function setBody(continuationToken) {
        var body = { count: result, continuationToken: continuationToken };
        getContext().getResponse().setBody(body);
    }
}

score 0 · Answer 7 · answered Jan 02 '15 at 22:58

0

Currently does not exist. I had a similar scenario and we ended up adding a counter to a document attribute that gets updated every time a document gets added or deleted. You could even make these two steps as part of a store procedure or a trigger if you want atomicity.

answered Jan 02 '15 at 22:58

Luis Delgado

3,644
4
34
54

BTW, this and other similar functions are under review by the docdb product group. You can go ahead and vote for that as well. http://feedback.azure.com/forums/263030-documentdb/suggestions/6333963-add-support-for-aggregate-functions-like-count-su – Luis Delgado Jan 02 '15 at 23:09
As of 29 July 2016, they've started: "We have started work on aggregate support and will update this feedback item as we have more information." – jeremy Aug 04 '16 at 17:45

score 0 · Answer 8 · edited Sep 14 '15 at 05:32

0

My code count solution is also working...once I just selected the id as a Papa Ours pointed out :) To get my original post to work, replace this line:

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT * FROM c").ToList();

with this line:

var documentCount = dc.CreateDocumentQuery(update.SelfLink, "SELECT id FROM c").ToList()

I still like the idea of the stored procedure as it will work in the documentdb studio (really cool project :)) - https://studiodocumentdb.codeplex.com/

edited Sep 14 '15 at 05:32

abatishchev

98,240
88
296
433

answered Jan 06 '15 at 15:38

ProgramTheWorld

537
1
4
6

This can cause higher RU – nrofis Dec 04 '18 at 09:21

Get record count in Azure DocumentDb

8 Answers8

Linked