5

I have a scenario where most of the documents i want to delete are in a collection called "expired". I do not want to overload my servers by running a long running process which would iterate over documents and delete them one by one i would rather do them in batch size using document-delete.

So my question is how does xdmp:collection-delete work ?

Does it iterate over documents and delete them ?

or

Does it do something like DROP Table in SQL and its "instantaneous" ?

I want to know what is the background process for xdmp:collection-delete. I wonder if anyone can draw the flow of how this function handles document for deletion as i want to understand the process in more depth than just overview of what it does.

Community
  • 1
  • 1
Zeeshan Abbas
  • 821
  • 6
  • 20
  • Keep in mind that dropping a table is not quite the same thing as deleting a collection of documents.. – grtjn Sep 19 '16 at 12:27
  • Can you elaborate what you are after exactly? It essentially comes down to iterating over docs, locking them, and deleting them, all in one transaction. In certain circumstances it can take a few short-cuts, but it still needs to do all that, just as described below.. – grtjn Sep 19 '16 at 14:13
  • we are looking to delete millions of documents in our database without overloading the server as there are other processes running on it as well and we have a limited cpu and memory which is why we have built our own purger which takes in batch sizes and does the purging in small chunks. I was wondering if this function was a better approach over our custom purger. – Zeeshan Abbas Sep 19 '16 at 14:19
  • one more detail, all those documents are in a single collection called "expired" – Zeeshan Abbas Sep 19 '16 at 14:26

2 Answers2

8

xdmp:collection-delete() will delete all documents in the collection in a single transaction. While it's not instantaneous, it should be fast, as it just needs to set the deletion timestamp of each document.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
Dave Cassel
  • 8,352
  • 20
  • 38
  • in the case of million documents does fast mean seconds ? Will it hold locks on documents for a long time? ( which is basically we are trying to avoid. ) – Zeeshan Abbas Sep 19 '16 at 11:33
  • 1
    There are a few criteria to meet before xdmp:collection-delete will be executed in so-called fast mode, but even in fast mode it has to put locks for what is being deleted for ACID compliancy. – grtjn Sep 19 '16 at 12:27
  • i am actually going to run this on a LIVE server so i want to know what is the background process for xdmp:collection-delete. I wonder if anyone can draw the flow of how this function handles document for deletion as i want to understand the process in more depth than just overview of what it does. – Zeeshan Abbas Sep 19 '16 at 12:55
  • Another way to potentially improve overall performance wrt updates is to set your app server to run in `nonblocking` MVCC mode instead of the default `contemporaneous` mode. Depending on the requirements of your application, however, this may not be feasible. Essentially, it means that an update in the commit phase will not block a read/query transaction, which will simply read from the most recent timestamp for which all transactions are known to have committed, instead of waiting for the commit to complete. – wst Sep 19 '16 at 16:12
  • 1
    Curious - if these are in a collection called 'expired', if you write your queries to ignore items in the expired collection >> cts:not-query(cts:collection-query('expired')) <<, then does it matter as much if they take a bit of time to delete of they are already isolated from your active queries? – David Ennis -CleverLlamas.com Sep 19 '16 at 18:45
  • I'd say yes, ML still needs to lock the docs for deletion, unless you disable locking as suggested by wst.. – grtjn Sep 19 '16 at 18:48
1

You may try to use corb to delete documents one by one. You may increase threads though for parallel processing.

Jeet
  • 13
  • 3