5

I have looked around at a few answers/questions regarding this issue but yet to find a solution.

I have a collection with documents (simplified) as such:

{
    "id": 123
    "stuff": "abc"
    "array":[
        {
        "id2":456
        "properties": [
                {
                    "id3": 789
                    "important": true
                }
            ]
        }
    ]
} 

I want to check for each document in my collection, for each array object within array, for each properties, if it has important: true for example. Then return:

"id": 123
"id2": 456
"id3": 789

I have tried using:

client.queryDocuments(self.collection._self, querySpec).toArray(function(err, results) {
    if (err) {
        callback(err);
    } else {
        callback(null, results[0]);
    }
    });

But the issue is an array has a maximum character limit. If my collection has millions of documents, this would presumably be exceeded. (Javascript Increase max array size)

Or, am I misunderstanding the above question? Is it talking about the number of objects in an array (of which, each can have unlimited object character length?)

Thus I am looking a for loop-esque solution, where each document is returned, I do my analysis, then move to then next/do them in parallel.

Any insight would be greatly appreciated.

JDT
  • 965
  • 2
  • 8
  • 20

2 Answers2

1

But the issue is an array has a maximum character limit. If my collection has millions of documents, this would presumably be exceeded. (Javascript Increase max array size)

Based on my research,the longest possible array in js could have 232-1 = 4,294,967,295 = 4.29 billion elements. However, it is perfectly enough to meet your millions data volume requirements. In addition,you can't query such huge volume data directly surely,that's impossible you do that.

Whether about throughput constraints(RUs settings) or query efficiency factors, you should consider batching large volumes of data anyway.

Thus I am looking a for loop-esque solution, where each document is returned, I do my analysis, then move to then next/do them in parallel.

Maybe you could use v2 js sdk for cosmos db sql api.Please refer to the sample code:

const cosmos = require('@azure/cosmos');
const CosmosClient = cosmos.CosmosClient;

const endpoint = "https://***.documents.azure.com:443/";                 // Add your endpoint
const masterKey = "***";  // Add the masterkey of the endpoint
const client = new CosmosClient({ endpoint, auth: { masterKey } });
const databaseId = "db";
const containerId = "coll";

async function run() {
    const { container, database } = await init();
    const querySpec = {
        query: "SELECT r.id,r._ts FROM root r"
    };
    const queryOptions  = {
        maxItemCount : -1
    }
   const queryIterator = await container.items.query(querySpec,queryOptions);
    while (queryIterator.hasMoreResults()) {
        const { result: results, headers } = await queryIterator.executeNext();
        console.log(results)
        console.log(headers)
        //do what you want to do

        if (results === undefined) {
            // no more results
            break;
        }   
    }
}

async function init() {
    const { database } = await client.databases.createIfNotExists({ id: databaseId });
    const { container } = await database.containers.createIfNotExists({ id: containerId });
    return { database, container };
}

run().catch(err => {
    console.error(err);
});

More details about continuation token ,please refer to my previous case.Any concern,please let me know.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • Will try this shortly. The issue is my function app in sure are still I'm v1. Is there a v1 solution? Also the 4 billion length, is that characters or objects? – JDT Sep 07 '18 at 11:55
  • @JDT the 4 billion length is objects. You mean you use v1 azure function? – Jay Gong Sep 10 '18 at 02:26
  • Yes v1 Azure. If it is 4 billion objects then this may not be an issue, at least not for a while – JDT Sep 10 '18 at 13:45
0

I am using Cosmos DB SQL API Node.js library. I am unable to find the Continuation Token from this library so that I can return it to client. The idea is to get it back from the client for the next pagination request.

I have a working code which iterates multiple times to get all the documents. What changes will be required here to get the continuation token?

function queryCollectionPaging() {  
return new Promise((resolve, reject) => {
    function executeNextWithRetry(iterator, callback) {         
        iterator.executeNext(function (err, results, responseHeaders) {
            if (err) {
                return callback(err, null);
            }
            else {
                documents = documents.concat(results);
                if (iterator.hasMoreResults()) {
                    executeNextWithRetry(iterator, callback);
                }
                else {
                    callback();
                }
            }
        });
    }

    let options = {
        maxItemCount: 1,
        enableCrossPartitionQuery: true
    };

    let documents = []
    let iterator = client.queryDocuments( collectionUrl, 'SELECT r.partitionkey, r.documentid, r._ts FROM root r WHERE r.partitionkey in ("user1", "user2") ORDER BY r._ts', options);

    executeNextWithRetry(iterator, function (err, result) {
        if (err) {
            reject(err)
        }
        else {
            console.log(documents);
            resolve(documents)
        }
    });
});

};

Amlan
  • 241
  • 6
  • 13