2

When I Try to use java api to insert many data into Mongodb4.0 (replica set) it throws duplicate key errors. The amount of data is not too large, only about 300000 . & insert in 3-5 seconds

firstly I searched from the official document & website. it shows

Runs up to 256^3 per process (16777216)

the data source comes from rocketmq. here is my Code ,

consumer.subscribe(GetPropertyUtils.getTestTopic(), "*", new MessageListener() {
            long insertnums = 0L ;
            List<Document> documentList= new ArrayList<Document>();
            @Override
            public Action consume(Message message, ConsumeContext context) {
                insertnums ++ ;
                consumerlogger.info(" now  total size is " + insertnums);

                String body = new String(message.getBody());
                Document document = Document.parse(body);
                documentList.add(document);
                //insert bulk 
                if(documentList.size()>=1000) {
                    try {
                        MongoInsert.insertData(documentList);
                        Thread.sleep(1000);
                    }catch (Exception e){
                        consumerlogger.error("insert sleep  3000");
                    }

                    documentList.clear();
                }
                return Action.CommitMessage;
            }

then insertData into MongoDB

 public  static  void  insertData(List<Document>  document){
        try{
            MongoInsertlogger.info("prepare to  insert ");
            //collection.insertMany(documents ,new InsertManyOptions().ordered(false));
            //---------

            List<WriteModel<Document>> requests = new ArrayList<WriteModel<Document>>();
            for (Document doc : document) {

                InsertOneModel<Document>  iom = new InsertOneModel<Document>(doc);
                requests.add(iom);
            }
            BulkWriteResult bulkWriteResult = collection.bulkWrite(requests,new BulkWriteOptions().ordered(false));
            System.out.println(bulkWriteResult.toString());


        }catch (Exception e){
            MongoInsertlogger.error("insert failed  , caused by " +e);
            System.out.println(e);
        }
    }

but the error shows

  BulkWriteError{index=811, code=11000, message='E11000 duplicate key error collection: yyj2.accpay index: _id_ dup key: { : ObjectId('5bea843604de38d61ff4d1fd') }', details={ }}, BulkWriteError{index=812, code=11000, message='E11000 duplicate key error collection: yyj2.accpay index: _id_ dup key: { : ObjectId('5bea843604de38d61ff4d1fe') }', details={ }}, BulkWriteError{index=813, code=11000, message='E11000 duplicate key error collection: yyj2.accpay index: _id_ dup key: { : ObjectId('5bea843604de38d61ff4d1ff') }', details={ }}, BulkWriteError{index=814, code=11000, message='E11000 duplicate key error collection: yyj2.accpay index: _id_ dup key: { : ObjectId('5bea843604de38d61ff4d200') }', details={ }}, BulkWriteError{index=815, code=11000, message='E11000 duplicate key error collection: ......

with my little data in java why does this happen, the object is created by MongoDB itself .and the data size is less than its supported, I use JDBC version mongo-java-driver 3.7.1
Thanks in advance !

JoshMc
  • 10,239
  • 2
  • 19
  • 38
HbnKing
  • 1,762
  • 1
  • 11
  • 25
  • The error is pretty explicit, and you seem to be holding on to data already inserted. One thing I notice here is that you are not resetting `documentList` to an empty array after you insert. You also probably mean a "modulo of 1000" instead of "whenever there is more than 1000", otherwise you are inserting on **every** iteration once the list grows to that size. – Neil Lunn Nov 13 '18 at 08:22
  • @NeilLunn when insert over I use `documentList.clear(); `to clear data . – HbnKing Nov 13 '18 at 08:40
  • Did not see that. Are you absolutely certain the data going into `Document.parse(body)` does not have an `_id` key? Since you appear to be processing on "stream events" ( in simple terms ) then the lack of any "backpressure" here to "halt" the stream processing in between the "insert" and the "clear" can also be a bit suspect here. If you have this running in multiple threads or similar parallel processing then it's highly likely something else is still adding to that "list" before you actually clear it as well. I'd add some more logging, including the "parse" results and see what comes out. – Neil Lunn Nov 13 '18 at 08:54

1 Answers1

-1

you get this error when the document already exists in your database - as defined by a duplicate of the primary key.

user1709076
  • 2,538
  • 9
  • 38
  • 59