3

I am trying to do bulk deletion using stored procedures. While creating the collection I am not passing Partition Key. I want to do the deletion without partition key. Right now my query is running but it is not deleting anything from the collection. Is it required to use partition key for bulk deletion/update?

Below is my code:

def delete_bulk(query,client,coll_link):
    sproc = {
                'id': 'deleteProcedure',
                'body': ('''
                    function bulkDeleteProcedure(query) {
    var collection = getContext().getCollection();
    var collectionLink = collection.getSelfLink();
    var response = getContext().getResponse();
    var responseBody = {
        deleted: 0,
        continuation: true
    };

    // Validate input.
    if (!query) throw new Error("The query is undefined or null.");

    tryQueryAndDelete();

    // Recursively runs the query w/ support for continuation tokens.
    // Calls tryDelete(documents) as soon as the query returns documents.
    function tryQueryAndDelete(continuation) {
        var requestOptions = {continuation: continuation};

        var isAccepted = collection.queryDocuments(collectionLink, query, requestOptions, function (err, retrievedDocs, responseOptions) {
            if (err) throw err;

            if (retrievedDocs.length > 0) {
                // Begin deleting documents as soon as documents are returned form the query results.
                // tryDelete() resumes querying after deleting; no need to page through continuation tokens.
                //  - this is to prioritize writes over reads given timeout constraints.
                tryDelete(retrievedDocs);
            } else if (responseOptions.continuation) {
                // Else if the query came back empty, but with a continuation token; repeat the query w/ the token.
                tryQueryAndDelete(responseOptions.continuation);
            } else {
                // Else if there are no more documents and no continuation token - we are finished deleting documents.
                responseBody.continuation = false;
                response.setBody(responseBody);
            }
        });

        // If we hit execution bounds - return continuation: true.
        if (!isAccepted) {
            response.setBody(responseBody);
        }
    }

    // Recursively deletes documents passed in as an array argument.
    // Attempts to query for more on empty array.
    function tryDelete(documents) {
        if (documents.length > 0) {
            // Delete the first document in the array.

                    var isAccepted = collection.deleteDocument(documents[0], {}, function (err, responseOptions) {
                if (err) throw err;

                responseBody.deleted++;
                documents.shift();
                // Delete the next document in the array.
                tryDelete(documents);
            });

            // If we hit execution bounds - return continuation: true.
            if (!isAccepted) {
                response.setBody(responseBody);
            }


        } else {
            // If the document array is empty, query for more documents.
            tryQueryAndDelete();
        }
    }
}
                    ''')
            }

    try:
        # Create a container
        created_sproc = client.CreateStoredProcedure(coll_link, sproc)
        proc_link = created_sproc['_self']
    except Exception as e:
        proc_id = sproc['id']
        proc_query = "select * from r where r.id = '{0}'".format(proc_id)
        proc = list(client.QueryStoredProcedures(coll_link, proc_query))[0]
        proc_link = proc['_self'] 

    client.ExecuteStoredProcedure(proc_link, query)
    print('Deletion Done!!!')
Anurag
  • 117
  • 10

1 Answers1

3

I notice that you use bulk delete js from github code. That code is working well however there is a key point you need to know.

Partition Key should be provided with execution of SP all time.(Please refer to this detailed case: Delete Documents from Cosmos using Query without Partition Key Specification)

So,follow the statement,stored procedures are best suited for operations that are write-heavy,not read or delete heavy. You could follow official suggestion to get an idea of Bulk Executor Lib SDK. I found BulkDelete feature in this source code,please try it.

Jay Gong
  • 23,163
  • 2
  • 27
  • 32
  • Thanks for the response. So I don't need to use stored procedure for deletion. But when I am doing deletion without stored procedure it is giving me timed out. As the data is too much and there is limitation of 5 sec. for running a query. – Anurag Mar 13 '20 at 06:42
  • @Anurag How about use the bulk executor mentioned in my answer? It could control the pagination of requests. You could take a shot on that. – Jay Gong Mar 13 '20 at 06:43
  • Does it support Python? – Anurag Mar 13 '20 at 07:02
  • Sorry,no. Only java and .net. – Jay Gong Mar 13 '20 at 07:11
  • But I am using Python for my application. That's the issue. I am unable to find good use cases in Python. – Anurag Mar 13 '20 at 07:51
  • @Anurag Well, so far, no official python option for bulk delete executor based on my investigation. If you don't mind, it seems that you have to deal with the pagination of delete in your python sdk code by yourself to avoid the time limitation. Of course,you could connect with cosmos db team to push the progress of python library because i believe it should be on the way after .net and java existence. – Jay Gong Mar 13 '20 at 08:26
  • 1
    Okay @Jay will contact MS team. – Anurag Mar 13 '20 at 08:30
  • @Anurag Sure,any updates you could post here. Thank you in advance. – Jay Gong Mar 13 '20 at 08:31
  • Sure. will post here :) – Anurag Mar 13 '20 at 08:33