0

I have a very simple collection with no indexes except _id. Each document contains a single array field, mies. Current collection size is around 100 millions, and I can see the following anomalies in the profiler:

{
  "op": "update",
  "ns": "DB.links",
  "command": {
    "q": {
      "_id": "f1b54f37-7f92-4e75-9ec6-5329349ce792_eb370c8a-6f33-4989-aa59-a26e1c9df46c"
    },
    "u": {
      "$push": {
        "mies": {
          "$each": [
            {
              "$oid": "5e39d07bec34b8000e7f86b7"
            }
          ]
        }
      }
    },
    "multi": true,
    "upsert": true
  },
  "keysExamined": 0,
  "docsExamined": 0,
  "nMatched": 0,
  "nModified": 0,
  "upsert": true,
  "keysInserted": 1,
  "numYield": 0,
  "locks": {
    "Global": {
      "acquireCount": {
        "r": 2,
        "w": 2
      }
    },
    "Database": {
      "acquireCount": {
        "w": 2
      },
      "acquireWaitCount": {
        "w": 1
      },
      "timeAcquiringMicros": {
        "w": 19486143
      }
    },
    "Collection": {
      "acquireCount": {
        "w": 1
      }
    },
    "oplog": {
      "acquireCount": {
        "w": 1
      }
    }
  },
  "millis": 19490,
  "planSummary": "IDHACK",
  "execStats": {
    "stage": "UPDATE",
    "nReturned": 0,
    "executionTimeMillisEstimate": 0,
    "works": 2,
    "advanced": 0,
    "needTime": 1,
    "needYield": 0,
    "saveState": 0,
    "restoreState": 0,
    "isEOF": 1,
    "invalidates": 0,
    "nMatched": 0,
    "nWouldModify": 0,
    "nInvalidateSkips": 0,
    "wouldInsert": true,
    "fastmodinsert": false,
    "inputStage": {
      "stage": "IDHACK",
      "nReturned": 0,
      "executionTimeMillisEstimate": 0,
      "works": 1,
      "advanced": 0,
      "needTime": 0,
      "needYield": 0,
      "saveState": 0,
      "restoreState": 0,
      "isEOF": 1,
      "invalidates": 0,
      "keysExamined": 0,
      "docsExamined": 0
    }
  }
}

As you can see, a simple upsert with a single $push to array took 19 seconds. I believe most of the time was spent here:

      "timeAcquiringMicros": {
        "w": 19486143
      }

What should I check? How can I improve the performance?

silent-box
  • 1,649
  • 3
  • 21
  • 40
  • As you're only querying on `_id` which is an default indexed field, then query is quiet ok, but what concerns is 100 millions of docs, maybe why don't you think of sharding or somehow archiving old records if not needed on much of reads ? – whoami - fakeFaceTrueSoul Feb 04 '20 at 20:39
  • Yes, but due to business logic, the data should be accumulated for a period of time. I can see that index size for hot collections exceeds MongoDB's cache size. Is that the reason for slow locks? – silent-box Feb 04 '20 at 20:42
  • 1
    I would say at least separate that collection from others & maintain it alone in a different cluster if possible(if you do not want to do sharding & want all docs to be on same cluster) - that way you can have the cluster size(RAM + disk) good enough for that collection & also high I/O... Check this documentation :: https://docs.mongodb.com/manual/tutorial/ensure-indexes-fit-ram/ – whoami - fakeFaceTrueSoul Feb 04 '20 at 20:53

2 Answers2

1

While this doesn't directly provide a solution to your problem, I didn't see any else mention this as a potential cause of it. One of the reasons why your operation is slow, is due to how mongo handles this operation; specifically, in regards to mongo's replication design. If you don't care for the explanation, just skip to the bottom.

Per the link above:

Every operation that modifies the data on the primary is stored in a special capped collection called oplog (short for operations log). The oplog is asynchronously synchronized to all the secondaries which replay the operations to eventually be consistent with the primary.

For oplog to work, it needs to be idempotent. Given that $push is not, mongo converts this operation into a $set - replacing the entire array.

The bottom line is that whenever you push something to an array the entire array is replaced as far as replication goes.

laventnc
  • 175
  • 13
0

You cannot

MongoDB uses B-tree algorithm for indexing opeations like: insert, search, delete.

Algorithm   Average     Worst case
Space       O(n)        O(n)
Search      O(log n)    O(log n)
Insert      O(log n)    O(log n)
Delete      O(log n)    O(log n)

MongoDB needs to drill down ordered index to find a value. As _id is unique and default index, MongoDB needs 27 iteration to find a document in the worst case ( 227 for 134.217.728 [~19sec], 228 for 268.435.456 [~21sec], etc...)

You can improve search operation by creating compound index, but will penalize insert as MongoDB needs to update _id index + cound index.

Valijon
  • 12,667
  • 4
  • 34
  • 67
  • Which compound index are you talking about? And how the search is related to 19 sec lock operation? – silent-box Feb 04 '20 at 21:15
  • Imagine you have only 100 documents and MongoDB need 1 sec for each iteration. If you don't have index, MonogDB scans from 1, 2, 3, ... 100 (100 secs) (known also as `O(n)`). But with B-tree you need only 6 tries (6 secs) (discussed [here](https://puzzling.stackexchange.com/questions/3074/6-tries-to-guess-a-number-between-1-100)). Compound index discussed [here](https://stackoverflow.com/questions/47893613/mongodb-find-performance-single-compound-index-vs-two-single-field-indexes) – Valijon Feb 04 '20 at 21:31
  • 1
    I know what BTree is, and the compound index. Unfortunately it has nothing to do with the problem – silent-box Feb 05 '20 at 15:20
  • @silent-box I've answered to your main question: `How can I improve the performance?` and explained why it's not possible – Valijon Feb 05 '20 at 15:27