3

Is there a way to copy all items collection to new collection without looping all items ? I find a way with looping by DBCursor:

...
DB db = mongoTemplate.getDb();
DBCursor cursor = db.getCollection("xxx").find();

//loop all items in collection
while (cursor.hasNext()) {
   BasicDBObject b = (BasicDBObject) cursor.next();
   // copy to new collection 
   service.createNewCollection(b);
}
...

Can you suggest do copy in java without looping all items ?
(Not In the mongo shell, with java implemintation) Tnx.

tstorms
  • 4,941
  • 1
  • 25
  • 47
prilia
  • 996
  • 5
  • 18
  • 41

5 Answers5

6

In MongoDB 2.6, the $out aggregation operator was added which writes the results of the aggregation to a collection. This provides a simple way to do a server-side copy of all the items in a collection to another collection in the same database using the Java driver (I used Java driver version 2.12.0):

// set up pipeline
List<DBObject> ops = new ArrayList<DBObject>();
ops.add(new BasicDBObject("$out", "target")); // writes to collection "target"

// run it
MongoClient client = new MongoClient("host");
DBCollection source = client.getDB("db").getCollection("source")
source.aggregate(ops);

The one-liner version:

source.aggregate(Arrays.asList((DBObject)new BasicDBObject("$out", "target")));

According to the docs, for large datasets (>100MB) you may want to use the allowDiskUse option (Aggregation Memory Restrictions), although I didn't run into that limit when I ran it on a >2GB collection, so it may not apply to this particular pipeline, at least in 2.6.0.

kellogg.lee
  • 215
  • 2
  • 8
2

I followed the advice of inserting an array of objects: Better way to move MongoDB Collection to another Collection This reduced my time from 45 minutes to 2 minutes. Here's the Java code.

        final int OBJECT_BUFFER_SIZE = 2000;
        int rowNumber = 0;
        List<DBObject> objects;
        final int totalRows = cursor.size();
        logger.debug("Mongo query result size: " + totalRows);
            // Loop design based on this:
            // https://stackoverflow.com/questions/18525348/better-way-to-move-mongodb-collection-to-another-collection/20889762#20889762
            // Use multiple threads to improve
            do {
                logger.debug(String.format("Mongo buffer starts row %d - %d copy into %s", rowNumber,
                        (rowNumber + OBJECT_BUFFER_SIZE) - 1, dB2.getStringValue()));
                cursor = db.getCollection(collectionName.getStringValue()).find(qo)
                        .sort(new BasicDBObject("$natural", 1)).skip(rowNumber).limit(OBJECT_BUFFER_SIZE);
                objects = cursor.toArray();
                try {
                    if (objects.size() > 0) {
                        db2.getCollection(collectionName.getStringValue()).insert(objects);
                    }
                } catch (final BSONException e) {
                    logger.warn(String.format(
                            "Mongodb copy %s %s: mongodb error. A row between %d - %d will be skipped.",
                            dB1.getStringValue(), collectionName.getStringValue(), rowNumber, rowNumber
                                    + OBJECT_BUFFER_SIZE));
                    logger.error(e);
                }
                rowNumber = rowNumber + objects.size();
            } while (rowNumber < totalRows);

The buffer size appears to be important. A size of 10,000 worked fine; however, for a variety of other reasons I selected a smaller size.

Community
  • 1
  • 1
B. Robinson
  • 106
  • 1
  • 6
0

You could use google guava to do this. To have a Set from an iterator, you can use Sets#NewHashSet(Iterator).

tstorms
  • 4,941
  • 1
  • 25
  • 47
  • I think you should be able to figure that out yourself. DBCursor has a method that returns its iterator. – tstorms Apr 23 '13 at 17:32
0

My idea is to send the cloneCollection admin command from the Java Driver. Below is a partial example.

DB db = mongo.getDB("admin");
DBObject cmd = new BasicDBObject();
cmd.put("cloneCollection", "users.profiles");//the collection to clone

//add the code here to build the rest of the required fields as JSON string 

CommandResult result = db.command(cmd);

I remember leveraging the JSON.parse(...) util API of the driver to let the driver build the structure behind the scenes. Try this as this is much simpler.

NOTE: I haven't tried this but I'am confident this will work.

Aravind Yarram
  • 78,777
  • 46
  • 231
  • 327
0

I think the using the aggregation operator stated by kellogg.lee is best method if the target collection is in the same database.

In order to copy to a collection that is in some other database running at a different mongod instance the following methods can be used:

First Method:

List<Document> documentList = sourceCollection.find().into(new ArrayList<Document>);
targetCollection.insertMany(documentList);

However this method might cause outOfMemory error if source collection is huge.

Second Method:

sourceCollection.find().batchSize(1000).forEach((Block<? super Document>) document -> targetCollection.insertOne(document));

This method is safer than the first one since it is not keeping a local list of whole documents and chunk size can be determined according to memory requirements. However this might be slower than the first one.

Kasifibs
  • 76
  • 4