5

I have the following challenge. We have csv files that we want to load into MarkLogic database using mlcp. We also want to transform the loaded rows during the load into OBI sources, so we buils a transform function for that.

Now I am struggling with the transform. Without the transform the data loads as a doc per row as expected.

csv example:

voornaam,achternaam
hugo,koopmans
thijs,van ulden

transform-ambulance.xqy:

xquery version "1.0-ml";
module namespace rws = "http://marklogic.com/rws";

import module namespace source = "http://marklogic.com/solutions/obi/source" at "/ext/obi/lib/source-lib.xqy";

(: If the input document is XML, create an OBI source from it, with the value
 : specified in the input parameter. If the input document is not
 : XML, leave it as-is.
 :)
declare function rws:transform(
  $content as map:map,
  $context as map:map
) as map:map*
{
  let $attr-value := 
    (map:get($context, "transform_param"), "UNDEFINED")[1]
  let $the-doc := map:get($content, "value")
  return
    if (fn:empty($the-doc/element()))
    then $content
    else
      let $root := xdmp:unquote($the-doc/*)
      let $source-title := "ambulance source data"
      let $collection := 'ambulance'
      let $source-id := source:create-source($source-title, (),$root)      
      let $_ := xdmp:document-add-collections(concat("/marklogic.solutions.obi/source/", $source-id[1],".xml"), $collection)
      return (
        map:put($content, "value",
          $source-id[2]
        ), $content
      )
};

mlcp command:

mlcp.sh import \
 -host localhost \
 -port 27041 \
 -username admin \
 -password admin \
 -input_file_path ./sampledata/so-example.csv \
 -input_file_type delimited_text \
 -transform_module /transforms/transform-ambulance.xqy \
 -transform_namespace "http://marklogic.com/rws" \
 -mode local

mlcp output:

15/09/08 21:35:08 INFO contentpump.ContentPump: Hadoop library version: 2.6.0
15/09/08 21:35:08 INFO contentpump.LocalJobRunner: Content type: XML
15/09/08 21:35:08 INFO input.FileInputFormat: Total input paths to process : 1
15/09/08 21:35:10 WARN mapreduce.ContentWriter: XDMP-DOCROOTTEXT: xdmp:unquote(document{<root><voornaam>hugo</voornaam><achternaam>koopmans</achternaam></root>}) -- Invalid root text "hugokoopmans" at  line 1
15/09/08 21:35:10 WARN mapreduce.ContentWriter: XDMP-DOCROOTTEXT: xdmp:unquote(document{<root><voornaam>thijs</voornaam><achternaam>van ulden</achternaam></root>}) -- Invalid root text "thijsvan ulden" at  line 1
15/09/08 21:35:11 INFO contentpump.LocalJobRunner:  completed 100%
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: com.marklogic.contentpump.ContentPumpStats: 
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: ATTEMPTED_INPUT_RECORD_COUNT: 2
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: SKIPPED_INPUT_RECORD_COUNT: 0
15/09/08 21:35:11 INFO contentpump.LocalJobRunner: Total execution time: 2 sec

I have tried without the xdmp:unquote() but then I hit a coercion document-node() error...

Please advice...

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
Hugo Koopmans
  • 1,349
  • 1
  • 15
  • 27

1 Answers1

2

ok so the issue was that we needed to cast the $root variable as document-node()...

let $root := document {$the-doc/root}

solves the issue.

Hugo Koopmans
  • 1,349
  • 1
  • 15
  • 27
  • Glad you figured it out. Please accept your answer to move it off the "unanswered" list. (Yes, StackOverflow etiquette encourages you to do that.) – Dave Cassel Sep 11 '15 at 18:29
  • I notice a two small things: Firstly, `$the-doc` is already a document-node, so why take `root` element, and wrap that again as document-node. I think you can use `$the-doc` directly instead of `$root`. Secondly, you update `$content` with a value returned from `source:create-source`, but I thought that function already inserts a document into the database. I think you can just return an empty sequence there. – grtjn Sep 14 '15 at 07:27