3

I want to remove the particular column from the csv file and load it into database using mlcp.

My csv file contains:

URI,EmpId,Name,age,gender,salary
1/Niranjan,1,Niranjan,35,M,1000
2/Deepan,2,Deepan,25,M,2000
3/Mehul,3,Mehul,28,M,3000

I want to use that URI column as the uri for the document and also that uri column should be skipped/removed in the inserted document.

How to do it??

1 Answers1

4

Your best bet when using MLCP and not in MarkLogic Data Hub context is using MLCP tranforms. You can find some explanation, and a few examples here:

Transforming Content During Ingestion

In case you are converting your CSV to JSON, you could use something like the following..

Save this as /strip-columns.sjs in your modules database:

/* jshint node: true */
/* global xdmp */

exports.transform = function(content, context) {
  'use strict';

  /* jshint camelcase: false */
  var stripColumns = (context.transform_param !== undefined) ? context.transform_param.split(/,/) : [];
  /* jshint camelcase: true */

  // detect JSON, assumes uri has correct extension
  if (xdmp.uriFormat(content.uri) === 'json') {

    // Convert input to mutable object for manipulation
    var newDoc = content.value.toObject();
    Object.keys(newDoc)
    .map(function(key) {
      if (stripColumns.indexOf(key) > -1) {
        delete newDoc[key];
      }
    });

    // Convert result back into a document
    content.value = newDoc;

  }

  // return updated content object
  return content;
};

And then you'd invoke it with something like this:

mlcp.sh import -input_file_path test.csv -input_file_type delimited_text -uri_id URI -document_type json -output_uri_prefix / -output_uri_suffix .json -output_collections data,type/csv,format/json -output_permissions app-user,read -transform_module /strip-columns.sjs -transform_param URI

HTH!

grtjn
  • 20,254
  • 1
  • 24
  • 35
  • I think this answer is applicable to a DHF context as well, as MLCP is commonly used for ingesting data into a DHF staging database. – rjrudin Dec 17 '18 at 17:56
  • DHF input flows use a datahub specific transform, which you'd not change itself. Instead you'd make changes to for instance the content.sjs that would get invoked by the datahub framework. – grtjn Dec 17 '18 at 18:07
  • Is there any mlcp transform function to ignore the column while loading the document? – Deepan Chelliah Dec 18 '18 at 06:50
  • Not out of the box, unfortunately. No parameter, nor a transform to copy-paste. Such a transform doesn't have to be difficult though, particularly if you are generating JSON, and use a server-side JavaScript type transform. – grtjn Dec 18 '18 at 08:07