Trying to dump the _id
column only.
With mongo shell it finishes in around 2 minutes:
time mongoexport -h localhost -d db1 -c collec1 -f _id -o u.text --csv
connected to: localhost
exported 68675826 records
real 2m20.970s
With java it takes about 30 minutes:
java -cp mongo-test-assembly-0.1.jar com.poshmark.Test
class Test {
public static void main(String[] args) {
MongoClient mongoClient = new MongoClient("localhost");
MongoDatabase database = mongoClient.getDatabase("db1");
MongoCollection<Document> collection = database.getCollection("collec1");
MongoCursor<Document> iterator = collection.find().projection(new Document("_id", 1)).iterator();
while (iterator.hasNext()) {
System.out.println(iterator.next().toString());
}
}
}
CPU usage on box is low, don't see any network latency issues, since both tests are running from same box
Update:
Used Files.newBufferedWriter
instead of System.out.println
but ended up with same performance.
Looked at db.currentOp(), makes me think that mongo is hitting disk since it is having too many numYields
{
"inprog" : [
{
"desc" : "conn8636699",
"threadId" : "0x79a70c0",
"connectionId" : 8636699,
"opid" : 1625079940,
"active" : true,
"secs_running" : 12,
"microsecs_running" : NumberLong(12008522),
"op" : "getmore",
"ns" : "users.users",
"query" : {
"_id" : {
"$exists" : true
}
},
"client" : "10.1.166.219:60324",
"numYields" : 10848,
"locks" : {
},
"waitingForLock" : false,
"lockStats" : {
"Global" : {
"acquireCount" : {
"r" : NumberLong(21696)
},
"acquireWaitCount" : {
"r" : NumberLong(26)
},
"timeAcquiringMicros" : {
"r" : NumberLong(28783)
}
},
"MMAPV1Journal" : {
"acquireCount" : {
"r" : NumberLong(10848)
},
"acquireWaitCount" : {
"r" : NumberLong(5)
},
"timeAcquiringMicros" : {
"r" : NumberLong(40870)
}
},
"Database" : {
"acquireCount" : {
"r" : NumberLong(10848)
}
},
"Collection" : {
"acquireCount" : {
"R" : NumberLong(10848)
}
}
}
}
]
}