How do I best make an MS-Excel
compatible csv
file representing all documents in a MarkLogic directory Using XCC
Java Client and Tomcat
and Marklogic
Both are remotely located. Number of document in the directory is around 15000.

- 771
- 1
- 8
- 22

- 1,565
- 5
- 22
- 36
-
This sounds like "can you please do my job for me" type of question. – Eric Bloch Feb 22 '13 at 16:44
-
@EricBloch I fully agree. Please follow my original question on this link: http://stackoverflow.com/questions/14998342/marklogic-query-response-time-is-very-high#comment21083544_14998342 .... thanks. – Ranjan Sarma Feb 22 '13 at 17:46
1 Answers
The first part, getting all the documents in a directory, is ready for us from avoiding XDMP-EXPNTREECACHEFULL and loading document
cts:search(
collection(),
cts:directory-query('path/to/documents/', 'infinity'))
As noted in my answer there, if you need further restrictions you could cts:and-query
that cts:directory-query
with other cts:query
terms.
Next you need to turn each XML document into CSV. That's fairly simple, but you have to know how your XML is structured or have some way to infer it. For this example I will say that I always have simple child element a
, b
, c
, d
under some root element. So the query needs to produce a CSV header for those elements, followed by lines of CSV.
We probably also want to hand in the directory URI from the caller. If you were using REST this would use xdmp:get-request-field
but for XCC it is an external value.
declare variable $DIRECTORY-URI as xs:string external ;
declare function local:csv($root as element()) as xs:string
{
string-join(($root/a, $root/b, $root/c, $root/d), ',')
};
'A,B,C,D',
cts:search(
collection(),
cts:directory-query($DIRECTORY-URI, 'infinity'))/local:csv(*)
Again, making local:csv
work for your application requires some knowledge of the XML or some way to infer its structure. You might need to put some values in double-quotes, too. But this basic structure is one of the most efficient ways to attack the problem. I've avoided any XQuery FLWOR expressions, so that the results can stream.
Another approach would be to use range indexes and http://docs.marklogic.com/cts:value-tuples with a cts:query
to restrict the results, then convert the JSON to CSV. This would be even more efficient because no fragments would be fetched. But this won't work well with some XML structures, and you may not have the luxury of create a range index for every CSV field.
declare variable $DIRECTORY-URI as xs:string external ;
declare function local:csv($ja as json:array) as xs:string
{
string-join(json:array-values($ja), ',')
};
'A,B,C,D',
local:csv(
cts:value-tuples(
(cts:element-reference(xs:QName('a')),
cts:element-reference(xs:QName('b')),
cts:element-reference(xs:QName('c')),
cts:element-reference(xs:QName('d'))),
(),
cts:directory-query($DIRECTORY-URI, 'infinity')))