0

I am trying to do a monogdump action to export DB, all records. There's no query parameters passed to the mongodump command as well.

MongoDB version is 4.4.3

I still see that it's performing a COLLSCAN from the mongo logs.

{


"t": {
    "$date": "2021-02-09T17:29:37.421+00:00"
  },
  "s": "I",
  "c": "COMMAND",
  "id": 51803,
  "ctx": "conn529",
  "msg": "Slow query",
  "attr": {
    "type": "command",
    "ns": "notifications.event",
    "appName": "mongodump",
    "command": {
      "getMore": 4159295894400030341,
      "collection": "event",
      "lsid": {
        "id": {
          "$uuid": "fd3284b9-86e8-4c8a-a3b4-a1787308a4ec"
        }
      },
      "$clusterTime": {
        "clusterTime": {
          "$timestamp": {
            "t": 1612891775,
            "i": 52
          }
        },
        "signature": {
          "hash": {
            "$binary": {
              "base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=",
              "subType": "0"
            }
          },
          "keyId": 0
        }
      },
      "$db": "notifications",
      "$readPreference": {
        "mode": "primaryPreferred"
      }
    },
    "originatingCommand": {
      "find": "event",
      "filter": {
        
      },
      "lsid": {
        "id": {
          "$uuid": "fd3284b9-86e8-4c8a-a3b4-a1787308a4ec"
        }
      },
      "$clusterTime": {
        "clusterTime": {
          "$timestamp": {
            "t": 1612891718,
            "i": 4
          }
        },
        "signature": {
          "hash": {
            "$binary": {
              "base64": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=",
              "subType": "0"
            }
          },
          "keyId": 0
        }
      },
      "$db": "notifications",
      "$readPreference": {
        "mode": "primaryPreferred"
      }
    },
    "planSummary": "COLLSCAN",
    "cursorid": 4159295894400030341,
    "keysExamined": 0,
    "docsExamined": 72989,
    "numYields": 93,
    "nreturned": 72989,
    "reslen": 16777333,
    "locks": {
      "ReplicationStateTransition": {
        "acquireCount": {
          "w": 94
        }
      },
      "Global": {
        "acquireCount": {
          "r": 94
        }
      },
      "Database": {
        "acquireCount": {
          "r": 94
        }
      },
      "Collection": {
        "acquireCount": {
          "r": 94
        }
      },
      "Mutex": {
        "acquireCount": {
          "r": 1
        }
      }
    },
    "storage": {
      "data": {
        "bytesRead": 17559327,
        "timeReadingMicros": 1336714
      }
    },
    "protocol": "op_msg",
    "durationMillis": 1443
  }
}

Anyway I can avoid the COLLSCAN? Any reason why it would be coming up?

prasad_
  • 12,755
  • 2
  • 24
  • 36
swateek
  • 6,735
  • 8
  • 34
  • 48
  • 2
    Because you are dumping the whole collection? I mean you are fetching all documents anyway. Does it matter whether an index is being used or not? – Alex Blex Feb 10 '21 at 09:52
  • I agree @AlexBlex I am fetching all documents.. but the size is heavy in production, this is from a test setup. Hence the worry about COLLSCAN. – swateek Feb 10 '21 at 12:50
  • 1
    Dump from secondary. To rephrase the D. SM answer, COLLSCAN is a concern if you discard the loaded documents. It is okay if all scanned documents are landed on the client. You cannot deliver them without reading. – Alex Blex Feb 10 '21 at 13:59

1 Answers1

2

Collection scan is the most efficient way of retrieving the entire collection.

It is not always bad. It's only bad when you want a small number of documents (e.g., 1).

D. SM
  • 13,584
  • 3
  • 12
  • 21
  • this is interesting. Could you point me a doc which says this? I definitely don't need one document, but all documents. – swateek Feb 10 '21 at 12:42
  • 1
    This is basic database design which should be in any decent database textbook. – D. SM Feb 10 '21 at 16:45