MarkLogic Content Pump is an open-source, Java-based command-line tool (mlcp). mlcp provides the fastest way to import, export, and copy data to or from MarkLogic databases. It is designed for integration and automation in existing workflows and scripts.
https://developer.marklogic.com/products/mlcp
User Guide
https://docs.marklogic.com/guide/mlcp
Features
Content Pump can:
- Bulk load billions of local files
- Split and load large, aggregate XML files or delimited text
- Bulk load billions of triples or quads from RDF files
- Archive and restore database contents across environments
- Copy subsets of data between databases
- Load documents from HDFS, including Hadoop SequenceFiles
Data sources and destinations
Content Pump supports moving data between a MarkLogic database and any of the following:
- Local filesystem
- HDFS
- MarkLogic archive
- Another MarkLogic database
Formats
Content Pump supports
- XML, JSON, text, binary files
- RDF encoded in RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, or TriG serialization formats
- Compressed files and archives (ZIP, GZIP)
- MarkLogic archive, which includes both content and metadata (e.g., permissions and properties)
- Delimited text (e.g., CSV) (import only)
- Temporal Documents
- Hadoop SequenceFiles
Getting Started with MLCP
You may find this free online training course helpful.
To get started moving data with mlcp, download and unpack the binaries. For those interested in hacking or look at the internals, you can also download the Apache 2.0 licensed source.
To create your first import script make sure you have an XDBC server attached to your database (running on port 8006, for example, below). From the command line, run the following, substituting your particulars.