0

I have a requirement to insert a large amount of data from MarkLogic to RDBMS using DMSDK

Below is my code sample

ArrayList<ArrayList<String>> batch = new ArrayList<ArrayList<String>>();
DatabaseClient client = DatabaseClientFactory.newClient(config.getmlHost(), config.getmlPort(), new DatabaseClientFactory.BasicAuthContext(dbConfig.getuser(), dbConfig.getpassword()));
QueryManager queryMgr = client.newQueryManager();
StructuredQueryBuilder sb = queryMgr.newStructuredQueryBuilder();
StructuredQueryDefinition criteria = sb.and(sb.collection("collection1"),sb.collection("collection2"))
DataMovementManager dmm = client.newDataMovementManager();
QueryBatcher batcher = dmm.newQueryBatcher(criteria)
        .withBatchSize(10)
        .withThreadCount(12)
        .onUrisReady(
                        new ExportListener()
                        .onDocumentReady(doc -> {
                    logger.info("URI received : " + doc.getUri());
                    try {
                        //Getting data From xml and adding it into a arraylist for batch creation
                        ArrayList<String> getDataXml = new GetDataXml().GetDatafromXml(doc.getContent(new DOMHandle()),
                                dbuilder, xPath, ColumnNames);
                        batch.add(getDataXml);

                    } catch (Exception e) {
                        logger.error("Error in the Code", e);
                    }
                })).onQueryFailure(exception -> {
                    logger.error(exception);
                });
        dmm.startJob(batcher);
        batcher.awaitCompletion();
        dmm.stopJob(batcher);
        Class.forName("Driver Name");

        //connecting to RDBMS
        Connection conn = DriverManager.getConnection(DB_URL, USER, PASS)
        PreparedStatement pstmt = conn.prepareStatement("INSERT INTO DBNAME VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?)");

        //Creating Batches PreparedStatement.addBatch()
        for(ArrayList<String> eachObject : batch) {             
            createPreparedStatement(pstmt, eachObject).addBatch();
        }

        //
        int[] result = pstmt.executeBatch();
        logger.info("Total Records Inserted " + result.length);
        oracle.closeConnect(oracleConn);

public PreparedStatement createPreparedStatement(PreparedStatement pstmt, ArrayList<String> eachObject)
            throws SQLException {
        for (int i = 0; i < eachObject.size(); i++) {
            pstmt.setString(i + 1, eachObject.get(i));
        }
        return pstmt;
    }

This code only gets data from MarkLogic and it is not inserted into the RDBMS database after 1 batch is completed,can any one point my mistak in the code. Thanks in Advance.

  • I recommend trying https://github.com/rjrudin/ml-migration-starter instead of writing your own plumbing using DMSDK and JDBC. – rjrudin May 16 '19 at 20:24

1 Answers1

1

Consider created a prepared statement before starting the job and, within the onDocumentReady() listener:

  1. extracting one or more values from the document,
  2. setting the placeholders on the prepared statement to the values, and
  3. executing the prepared statement.

The downside of the strategy of accumulating all documents in an array is that the array could use up all available memory and that throughput should be better if database operations are interleaved.

Hoping that helps,

ehennum
  • 7,295
  • 13
  • 9
  • Thanks for the reply, the steps you mentioned above will insert data in RDBMS one by one but not in batches? – Praneet Sinha May 20 '19 at 05:21
  • It is possible to insert multiple rows in a single prepared statement -- see https://stackoverflow.com/questions/1176352/pdo-prepared-inserts-multiple-rows-in-single-query -- but the best approach will usually be to insert one batch of rows at a time rather than trying to insert all of the rows in one pass. – ehennum May 28 '19 at 16:37
  • Exactly I need to insert in batches but it's not happening in DMSDK – Praneet Sinha May 30 '19 at 04:51
  • I'm not sure understand. From what you've said, the DMSDK code is retrieving the data. The issue is with the prepared statement. For each retrieved document in one batch, the prepared statement should provide one parenthetical row as described in the SO answer linked above. – ehennum May 30 '19 at 16:30