1

I have an application (in Scala, though this question is probably driver agnostic) that collates inserts in batches of a thousand and then does a bulk insert. Because of the way the data is received and processed it's possible that we might have duplicate ids, we just want to ignore these.

According to the documentation for bulk write https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/#bulkwrite-example-unordered-bulk-write, using the ordered: false option should allow all inserts to run without the whole write failing because of an error.

Since this was an unordered operation, the writes remaining in the queue were processed despite the exception.

However, my error log is coming up with errors like this

I COMMAND [conn696515] command db.Matches command: insert { insert: "Matches", ordered: false, documents: 1000 } ninserted:4 keysInserted:52 exception: E11000 duplicate key error collection: db.Matches index: _id_ dup key: { : "3000758-3000343-3845342998431744-5-1646-----10-1" } code:11000 numYields:0 reslen:183245 locks:{ Global: { acquireCount: { r: 1017, w: 1017 } }, Database: { acquireCount: { w: 1017 } }, Collection: { acquireCount: { w: 1013 } }, Metadata: { acquireCount: { w: 4 } }, oplog: { acquireCount: { w: 4 } } } protocol:op_query 1003ms

Which suggests to me that only 4 were actually inserted of my 1000.

Does this mean?

  1. 996 of my inserts were duplicates (this is very unlikely)?
  2. The ordered:false option doesn't allow the inserts to continue?
  3. The documentation literally means single as in one write only failing, and then on a second fail it errors out?
  4. I'm reading the log output wrong and I am actually getting all these writes?
  5. The above only works on mongos? I'm using mongod

EDIT Here's the code that runs the query (note as far as I can tell it's the driver that batches it, we send 10000s to this method)

def insertMatchesIgnoreDuplicates(
  matches: HashMap[BsonString,Document],
  database: MongoDatabase
) : Future[_] = {
  val matchesCollection: MongoCollection[Document] = 
  database.getCollection("Matches")
  val inserts = matches.values.map(doc => {
    new InsertOneModel(doc)
  })
  val bulkWriteOptions = new BulkWriteOptions()
  bulkWriteOptions.ordered(false)
  matchesCollection.bulkWrite(inserts.toSeq, bulkWriteOptions).toFuture
}

And when catching the output here's successful

AcknowledgedBulkWriteResult{insertedCount=15107, matchedCount=0, removedCount=0, modifiedCount=0, upserts=[]}

and unsuccessful

com.mongodb.MongoBulkWriteException: Bulk write operation error on server 10.131.58.143:27017. Write errors: [BulkWriteError{index=52621, code=11000, message='E11000 duplicate key error collection: betbrain_v1.Matches index: id dup key: { : "3000177-3000012-3858875929331712-5---2.5---14-1" }', details={ }}].

Nathan Edwards
  • 311
  • 1
  • 3
  • 17
  • Show the actual statement and be sure to trap the error. The error response will show the write result which will include an array of any errors occurring and the position within the batch submitted in which the error occurred. That is what happens with an un-ordered bulk write, so if you don't actually get that response and there is a single error, then you did not submit as un-ordered. So show the code submitting and the error response "from the function" and not the logs as you presently showing. – Neil Lunn May 17 '18 at 21:36
  • @NeilLunn I've added those in an edit – Nathan Edwards May 18 '18 at 10:15
  • 1
    So, not really up on my Scala at the moment, but surely the Future is then returned with content which is the `WriteResult` as you would view it in the mongo shell ( a similar structure at least ). This is what you need to be looking at instead of the log file. That's what I told you in the comment. So are you keeping that returned value where you can inspect it or are you just discarding it as if you were expecting a "void" response from the function? Because it returns a structure with all the useful information you are looking for. – Neil Lunn May 18 '18 at 10:19
  • 1
    Point being that if this is anything like the other language drivers, then you get an "exception" of sorts returned. Standard console serialization of said exception gives you semi-useless information like you included in the post. Actually looking "inside" that structure reveals the actual "useful" information, such as where the list of errors are. – Neil Lunn May 18 '18 at 10:21
  • 1
    Don't have a direct language example to show right now but this python one shows what I'm talking about [How to Ignore Duplicate Key Errors Safely Using insert_many](https://stackoverflow.com/a/44838740/2313887) – Neil Lunn May 18 '18 at 10:25
  • @NeilLunn I've changed the code, it takes a while to reboot and hit an error. Most of the code was written with a fire and forget philosophy so that Future was just left to resolve however, only when I tried to map the result in order to log it did I actually get the exception show up. – Nathan Edwards May 18 '18 at 12:05
  • Cool. You should write in an answer how you got the write result object and inspected it to see the list of errors. You likely won't be the only one to come across this. – Neil Lunn May 18 '18 at 12:09

1 Answers1

0

the Future is failing here so i think you need to recover it.

Something like

matchesCollection
  .bulkWrite(inserts.toSeq, bulkWriteOptions)
  .toFuture
  .recover { case  e: MongoBulkWriteException => e.getWriteResult }

This allows you to convert it to a successful future where the type is still BulkWriteResult. And then you can inspect that and decide what to do with the failures.

NOTE - this doesn't work in transactions (which happened to be my use case :( ). From the mongo docs (https://docs.mongodb.com/manual/reference/method/db.collection.bulkWrite/)

Inside a transaction, the first error in a bulk write causes the entire bulk write to fail and aborts the transaction, even if the bulk write is unordered.

ChrisGPT was on strike
  • 127,765
  • 105
  • 273
  • 257
johgio
  • 13
  • 3