3

I am implementing a batch processing that needs to execute a high number of search queries in MarkLogic to find the documents to be modified. The query which I am using looks like this:

cts:search(/ch:haufe-document,
  cts:and-query((
    cts:element-range-query(fn:QName("http://idesk.haufe-lexware.com/document-meta","rootId"), "=", xs:string($root-id)),
    cts:element-range-query(fn:QName("http://contenthub.haufe-lexware.com/haufe-document","application"), "=", xs:string($APPLICATION-ID))
  ))
)

The $root-id will be different for each query, the $APPLICATION-ID is a constant value. Usually these queries return a small number of documents (less than 10), sometimes up to 150, but they still work fine. Only when many of such queries are executed in a row (could be more than 100000 for one batch job) at some point I get back an error like this:

XDMP-EXPNTREECACHEFULL: cts:search(fn:collection()/ch:haufe-document, cts:and-query((cts:element-range-query(fn:QName("http://idesk.haufe-lexware.com/document-meta","rootId"), "=", "HI14429659", ("collation=http://marklogic.com/collation/"), 1), cts:element-range-query(fn:QName("http://contenthub.haufe-lexware.com/haufe-document","application"), "=", "idesk", ("collation=http://marklogic.com/collation/"), 1)), ())) -- Expanded tree cache full on host some-host.cloudapp.net uri /content/idesk/c9103265-0a44-496b-b2b1-617b0b042208/HI14429659.xml

When I execute the same query manually it runs without problem and returns very few results (just one in most cases). The number of documents matching /ch:haufe-document is about 3 million, but does not change much during the processing (the documents are only modified). The database contains additional 1.5 million documents with meta data, these documents are added during the processing.

The strange thing is that the first two batch jobs, each processing >600000 documents, worked fine. But the third job failed with the error above and since then only very small jobs (~30000 docs) can be processed successfully.

I already tried to increase the size of the expanded tree cache, but it didn't help. I also tried an "unfiltered" search, but the error stays.

I would appreciate any hint to what the problem could be.

Update: One thing I didn't mention, because I didn't realize it might be relevant is this: The whole process is implemented as a REST extension, which is called from a Java application. A POST request is made which contains an XML with a list of document IDs to be processed. And this list can be very long (>100000 entries).

Hapeka
  • 80
  • 5

3 Answers3

0

The query that hits the Expanded Tree Cache Error may not be pulling a lot of documents. It may just be the last straw that broke the camels back.

Resolving XDMP-EXPNTREECACHEFULL errors

When the query needs to actually retrieve elements, values, or otherwise traverse the contents of one of these fragments, the fragment is uncompressed and cached in the expanded tree cache.

Consequently, the expanded tree cache needs to be large enough to maintain a copy of every expanded XML fragment that is simultaneously needed during query processing.

The error message XDMP-EXPNTREECACHEFULL: Expanded tree cache full means that MarkLogic has run out of room in the expanded tree cache during query evaluation, and that consequently it cannot continue evaluating the complete query.

There are a couple of options to handle this, depending upon what your needs and capacity are.

  • If you have enough memory available to allocate, you can bump up the ETC limit and provide more memory to service those requests.
  • If you have found some greedy and inefficient queries that are pulling a ton of docs at once, see if they can be broken out into smaller transactions.
  • If you have too many concurrent transactions processing too many docs, limit the number of appserver threads or lower the thread count on your batch jobs.
  • Configure a maximum readSize limit for auto-cancellation of transactions that exceed those limits: https://docs.marklogic.com/guide/performance/request_monitoring#id_10815
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • Thanks for your suggestions. One thing I do not understand about this expanded tree cache: the queries are executed sequentially (as in my case), and there is nothing else running on that test system. So if the cache gets full there should be some unused data (from previous queries) in it that can be evicted. But why does this not happen? – Hapeka May 23 '22 at 08:14
  • Separate queries executed sequentially should not blow the ETC. But if the selector for your batch is using cts:search, then that search is pulling all of those docs in a single query. Look to use cts:uris() instead, Simon e that just pulls the URI and doesn’t open the doc. Then read the doc in your batch process transactions. – Mads Hansen May 23 '22 at 11:16
0

Looking at that size, you are likely not going to solve the issue by increasing the memory. You are essentially trying to inhale the entire database into Memory from the looks of it. More batches equals more stuff in memory in parallell.

Step back and try to figure out what you are trying to accomplish. It would seem to be that whatever is trying to process those results will not be able to do the work all at once, so think about turning back references.

Here is an example to start you off by returning just the URIs. Then the calling code could fetch the docs etc at the time of processing each one(keeping memory usage lower)

cts:uris((),(),cts:element-query(xs:QName(ch:haufe-document),
   cts:and-query((
       cts:element-range-query(fn:QName("http://idesk.haufe-lexware.com/document-meta","rootId"), "=", xs:string($root-id)),
       cts:element-range-query(fn:QName("http://contenthub.haufe-lexware.com/haufe-document","application"), "=", xs:string($APPLICATION-ID))
     ))
   )
 )

I use cts:uris() as an example starting point.

  • Thanks for your reply. But what I do not understand is, the batch processing runs sequentially, i.e. I make a query, process the results, and then I make the next query. Why does this mean that there is a lot of stuff in memory in parallel? – Hapeka May 23 '22 at 07:36
  • You give an example of cts:search() which, by definition, returns documents (expanding them into memory in the process). You then also state that sometimes the result can be 100000 documents. I interpret your statements to indicate that example would be 100000 documents in memory. - not how MarkLogic is designed. Then if you had three such processors, you would then have 300000 documents in memory. The documentation specifically suggests to tune your query to have less fragments in memory (less results, like paginate). https://docs.marklogic.com/10.0/messages/XDMP-en/XDMP-EXPNTREECACHEFULL – David Ennis -CleverLlamas.com May 23 '22 at 15:32
  • I should have stated more "concurrent" batches. Batching a process does not infer a serialized approach. – David Ennis -CleverLlamas.com May 23 '22 at 15:34
  • That was a misunderstanding, I think. The process might execute (sequentially) 100000 queries or more, but each query will only return up to 150 results. – Hapeka May 24 '22 at 07:39
  • Regardless of the number of results, if you are hitting an expanded tree cache issue and you already have sane limits set based on normal use-cases and resources available, then the guidance in the documentation is still : Since increasing the cache size may strain other system resources, you should first attempt to modify or tune your query. – David Ennis -CleverLlamas.com May 24 '22 at 07:43
0

The solution I found is this: I modified the Java application such that it does not send all data to MarkLogic at once, but split it up into chunks of 10000 IDs. Now the error is gone. The downside is that the change is now done in several transactions, so the modifications become visible before everything is done. But for my usecase this is acceptable.

Hapeka
  • 80
  • 5