There has been some considerable confusion over space reclamation in MongoDB, and some recommended practice are downright dangerous to do in certain deployment types. More details below:
TL;DR repairDatabase
attempts to salvage data from a standalone MongoDB deployments that is trying to recover from a disk corruption. If it recovers space, it is purely a side effect. Recovering space should never be the primary consideration of running repairDatabase
.
Recover space in a standalone node
WiredTiger: For a standalone node with WiredTiger, running compact
will release space to the OS, with one caveat: The compact
command on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833 which was fixed in MongoDB 3.2.3. Prior to this version, compact
on WiredTiger could silently fail.
MMAPv1: Due to the way MMAPv1 works, there is no safe and supported method to recover space using the MMAPv1 storage engine. compact
in MMAPv1 will defragment the data files, potentially making more space available for new documents, but it will not release space back to the OS.
You may be able to run repairDatabase
if you fully understand the consequences of this potentially dangerous command (see below), since repairDatabase
essentially rewrites the whole database by discarding corrupt documents. As a side effect, this will create new MMAPv1 data files without any fragmentation on it and release space back to the OS.
For a less adventurous method, running mongodump
and mongorestore
may be possible as well in an MMAPv1 deployment, subject to the size of your deployment.
Recover space in a replica set
For replica set configurations, the best and the safest method to recover space is to perform an initial sync, for both WiredTiger and MMAPv1.
If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it. Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.
Please note that the feasibility of doing a rolling initial sync also depends on the size of your deployment. For extremely large deployments, it may not be feasible to do an initial sync, and thus your options are somewhat more limited. If WiredTiger is used, you may be able to take one secondary out of the set, start it as a standalone, run compact
on it, and rejoin it to the set.
Regarding repairDatabase
Please don't run repairDatabase
on replica set nodes. This is very dangerous, as mentioned in the repairDatabase page and described in more details below.
The name repairDatabase
is a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption on a standalone node, which could lead to corrupt documents.
The repairDatabase
command could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents in an attempt to get the database into a state where you can start it and salvage intact document from it.
In MMAPv1 deployments, this rebuilding of the database files releases space to the OS as a side effect. Releasing space to the OS was never the purpose.
Consequences of repairDatabase
on a replica set
In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabase
on a replica set node, there is a chance that the node contains undetected corruption, and repairDatabase
will dutifully remove the corrupt documents for you.
Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash.
To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.