Reducing MongoDB database file size

Question

I've got a MongoDB database that was once large (>3GB). Since then, documents have been deleted and I was expecting the size of the database files to decrease accordingly.

But since MongoDB keeps allocated space, the files are still large.

I read here and there that the admin command mongod --repair is used to free the unused space, but I don't have enough space on the disk to run this command.

Do you know a way I can freed up unused space?

starting with 2.8 version, you can [compress your data](http://stackoverflow.com/a/32733136/1090562), which saves significant amount of space. — Salvador Dali, Sep 23 '15 at 07:08
i had the same exact challenge, the easiest way to solve it was to make a copy of the database with the copyDatabase() function, then to db.dropDatabase() the original database and then to copy the database back in place. my database was mostly empty and when i did the copy, only the actual usable data was copied over. dropping the original database deleted the large files. using db.repairDatabase() was not an option since my server was already low in disk space and this operation would have required a very large amount of free space, much more than necessary for this operation. — user3892260, Jun 12 '16 at 18:14

Gates VP · Answer 1 · 2016-12-02T19:49:07.293

154

UPDATE: with the compact command and WiredTiger it looks like the extra disk space will actually be released to the OS.

UPDATE: as of v1.9+ there is a compact command.

This command will perform a compaction "in-line". It will still need some extra space, but not as much.

MongoDB compresses the files by:

copying the files to a new location
looping through the documents and re-ordering / re-solving them
replacing the original files with the new files

You can do this "compression" by running mongod --repair or by connecting directly and running db.repairDatabase().

In either case you need the space somewhere to copy the files. Now I don't know why you don't have enough space to perform a compress, however, you do have some options if you have another computer with more space.

Export the database to another computer with Mongo installed (using mongoexport) and then you can Import that same database (using mongoimport). This will result in a new database that is more compressed. Now you can stop the original mongod replace with the new database files and you're good to go.
Stop the current mongod and copy the database files to a bigger computer and run the repair on that computer. You can then move the new database files back to the original computer.

There is not currently a good way to "compact in place" using Mongo. And Mongo can definitely suck up a lot of space.

The best strategy right now for compaction is to run a Master-Slave setup. You can then compact the Slave, let it catch up and switch them over. I know still a little hairy. Maybe the Mongo team will come up with better in place compaction, but I don't think it's high on their list. Drive space is currently assumed to be cheap (and it usually is).

edited Dec 02 '16 at 19:49

answered Jun 04 '10 at 14:58

Gates VP

44,957
11
105
108

Thank you Gates VP for your answer. I was thinking of the two options you mentionned. But before doing such things, I wanted to know if a compact in place solution was available. Thanks again. – Meuble Jun 04 '10 at 17:01
3

As of today (2010-11-18) Dwight (speaking at the MongoDC event in Washington, DC) recommended the replicate / --repair / switch over approach if you want to compact without taking your database offline. – David J. Nov 18 '10 at 15:46
11

Just a heads up 'don't do like I did' and run --repair as root. chowns the db files to root. doh. – Totoro Apr 01 '11 at 01:28
20

The documentation for 'compact' says: "This operation will not reduce the amount of disk space used on the filesystem." I don't understand how this is a solution to the original question. – Ed Norris Jul 12 '12 at 17:28
If you look at the original question, part of the problem involved having too much data to perform a repair. If you have filled 2/3 of your drive with one DB, you could not perform a repair. Newly allocated files would suck up the remaining space before the new DB was completely "copied & repaired" and "the switch" would never happen. With `compact`, he can at least keep the existing files in place. I agree, it's not a full solution, but it's an incremental improvement. – Gates VP Jul 12 '12 at 20:46
From the docs for v3.2.10 and above: "Rewrites and defragments all data and indexes in a collection. On WiredTiger databases, this command will release unneeded disk space to the operating system." – test-in-prod Dec 02 '16 at 18:14
@GatesVP In my mongodb database there is unnecessary copy db files.. Like if i have abc db.. there is file like abc.01 abc.02 and so on.. I have space issue.. how can i remove these files – gifpif Mar 03 '17 at 07:03

score 48 · Answer 2 · edited Aug 07 '15 at 11:52

48

It looks like Mongo v1.9+ has support for the compact in place!

> db.runCommand( { compact : 'mycollectionname' } )

See the docs here: http://docs.mongodb.org/manual/reference/command/compact/

"Unlike repairDatabase, the compact command does not require double disk space to do its work. It does require a small amount of additional space while working. Additionally, compact is faster."

edited Aug 07 '15 at 11:52

michelem

14,430
5
50
66

answered Aug 17 '11 at 19:21

awaage

2,669
1
17
15

3

@AnujGupta "The repairDatabase command compacts all collections in the database. It is identical to running the compact command on each collection individually." http://docs.mongodb.org/manual/reference/command/repairDatabase/#dbcmd.repairDatabase. So if the repairDatabase reduces the size so as compact. I've been compacting my collections with lots of delete and update every week. I like compact more than repariDatabase because first it's targeted to collections you want not the entire database. Second it just needs 2GB free space instead of x2 of your db filesize (in my case 500GB). – Maziyar Oct 26 '13 at 23:48
1

Btw check this out:" MongoDB provides 2 different ways to compact your data and restore optimal performance: repairDatabase and compact. RepairDatabase is appropriate if your databases are relatively small, or you can afford to take a node out of rotation for quite a long time. For our database sizes and query workload, it made more sense to run continuous compaction over all our collections." http://blog.parse.com/2013/03/26/always-be-compacting/ https://github.com/ParsePlatform/Ops/blob/master/tools/mongo_compact.rb – Maziyar Oct 26 '13 at 23:52
3

@Maziyar http://docs.mongodb.org/manual/reference/command/compact/#disk-space - "Unlike repairDatabase, compact does not free space on the file system". – Anuj Gupta Apr 24 '14 at 17:26
@AnujGupta Exactly. It just frees up the space so it can be assigned to new documents instead of getting more space from your system. If your MongoDB database (collections) are growing the compact is a good way to make more space for new documents. – Maziyar Apr 25 '14 at 01:44
4

@Maziyar OP wants to *free up unused space*, which is achieved through `repairDatabase`, not `compact`. `compact` does not free up space, it only defragments the used up space, which does not reduce it. – Anuj Gupta Apr 25 '14 at 18:27
5

As of mongo 3.0, `compact` **will** reclaim space if using the WiredTiger storage engine. – Gary Dec 03 '15 at 00:57

score 40 · Answer 3 · answered Apr 04 '13 at 02:00

40

I had the same problem, and solved by simply doing this at the command line:

mongodump -d databasename
echo 'db.dropDatabase()' | mongo databasename
mongorestore dump/databasename

answered Apr 04 '13 at 02:00

user435943

974
10
14

assertion: 15936 Creating collection db.collection failed. Errmsg: exception: specify size: when capped is true – tweak2 Dec 10 '13 at 19:16
:Looks like an ubuntu regression... the dump file has metadata has capped:"undefined" in it... deleting these fixes the import problem. – tweak2 Dec 10 '13 at 19:19
2

My database has scored almost the entire disk. it was 120 GB (disk 160 GB) The compact doesn't reduce file size and repairDatabase is not possible due to lack of space . After mongodump & dropDatabase & mongorestore of db i have 40 GB of database size. – Igor Benikov Oct 02 '16 at 10:21
1

Small correction to the restore command `mongorestore --db databasename dump/databasename` – Jerry Mar 06 '17 at 12:25

score 32 · Answer 4 · answered Jul 13 '14 at 13:14

32

Compact all collections in current database

db.getCollectionNames().forEach(function (collectionName) {
    print('Compacting: ' + collectionName);
    db.runCommand({ compact: collectionName });
});

answered Jul 13 '14 at 13:14

OzzyCzech

9,713
3
50
34

from command line: ALL DBs, ALL collections: https://stackoverflow.com/questions/51819231/mongo-db-collection-compact – m1m1k Dec 09 '22 at 23:19

David J. · Answer 5 · 2012-08-27T03:50:28.840

13

If you need to run a full repair, use the repairpath option. Point it to a disk with more available space.

For example, on my Mac I've used:

mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair

Update: Per MongoDB Core Server Ticket 4266, you may need to add --nojournal to avoid an error:

mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair --nojournal

edited Aug 27 '12 at 03:50

answered Aug 27 '12 at 03:45

David J.

31,569
22
122
174

1

This worked great. I lacked the 2x space required to repair in place, so I mounted a NAS. Only issue, it took 18 hours to complete, but it did work. Make sure to add the --nojoural flag. – zenocon Apr 01 '16 at 15:04

score 11 · Answer 6 · answered Sep 23 '15 at 07:07

Starting with 2.8 version of Mongo, you can use compression. You will have 3 levels of compression with WiredTiger engine, mmap (which is default in 2.6 does not provide compression):

None
snappy (by default)
zlib

Here is an example of how much space will you be able to save for 16 GB of data:

enter image description here

data is taken from this article.

score 9 · Answer 7 · answered Jan 09 '16 at 06:24

We need solve 2 ways, based on StorageEngine.

1. MMAP() engine:

command: db.repairDatabase()

NOTE: repairDatabase requires free disk space equal to the size of your current data set plus 2 gigabytes. If the volume that holds dbpath lacks sufficient space, you can mount a separate volume and use that for the repair. When mounting a separate volume for repairDatabase you must run repairDatabase from the command line and use the --repairpath switch to specify the folder in which to store temporary repair files. eg: Imagine DB size is 120 GB means, (120*2)+2 = 242 GB Hard Disk space required.

another way you do collection wise, command: db.runCommand({compact: 'collectionName'})

2. WiredTiger: Its automatically resolved it-self.

score 8 · Answer 8 · edited Jun 20 '20 at 09:12

There has been some considerable confusion over space reclamation in MongoDB, and some recommended practice are downright dangerous to do in certain deployment types. More details below:

TL;DR repairDatabase attempts to salvage data from a standalone MongoDB deployments that is trying to recover from a disk corruption. If it recovers space, it is purely a side effect. Recovering space should never be the primary consideration of running repairDatabase.

Recover space in a standalone node

WiredTiger: For a standalone node with WiredTiger, running compact will release space to the OS, with one caveat: The compact command on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833 which was fixed in MongoDB 3.2.3. Prior to this version, compact on WiredTiger could silently fail.

MMAPv1: Due to the way MMAPv1 works, there is no safe and supported method to recover space using the MMAPv1 storage engine. compact in MMAPv1 will defragment the data files, potentially making more space available for new documents, but it will not release space back to the OS.

You may be able to run repairDatabase if you fully understand the consequences of this potentially dangerous command (see below), since repairDatabase essentially rewrites the whole database by discarding corrupt documents. As a side effect, this will create new MMAPv1 data files without any fragmentation on it and release space back to the OS.

For a less adventurous method, running mongodump and mongorestore may be possible as well in an MMAPv1 deployment, subject to the size of your deployment.

Recover space in a replica set

For replica set configurations, the best and the safest method to recover space is to perform an initial sync, for both WiredTiger and MMAPv1.

If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it. Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.

Please note that the feasibility of doing a rolling initial sync also depends on the size of your deployment. For extremely large deployments, it may not be feasible to do an initial sync, and thus your options are somewhat more limited. If WiredTiger is used, you may be able to take one secondary out of the set, start it as a standalone, run compact on it, and rejoin it to the set.

Regarding `repairDatabase`

Please don't run repairDatabase on replica set nodes. This is very dangerous, as mentioned in the repairDatabase page and described in more details below.

The name repairDatabase is a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption on a standalone node, which could lead to corrupt documents.

The repairDatabase command could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents in an attempt to get the database into a state where you can start it and salvage intact document from it.

In MMAPv1 deployments, this rebuilding of the database files releases space to the OS as a side effect. Releasing space to the OS was never the purpose.

Consequences of `repairDatabase` on a replica set

In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabase on a replica set node, there is a chance that the node contains undetected corruption, and repairDatabase will dutifully remove the corrupt documents for you.

Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash.

To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.

score 6 · Answer 9 · edited Oct 14 '18 at 09:32

In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.

Behavior of compaction process is dependent on MongoDB engine as follows

db.runCommand({compact: collection-name })

MMAPv1

Compaction operation defragments data files & indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.

An additional disk space up to 2GB is required during the compaction operation.

A database level lock is held during the compaction operation.

WiredTiger

The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.

The compact process releases the free space to the operating system. Minimal disk space is required to run the compact operation. WiredTiger also blocks all operations on the database as it needs database level lock.

For MMAPv1 engine, compact doest not return the space to operating system. You require to run repair operation to release the unused space.

db.runCommand({repairDatabase: 1})

score 3 · Answer 10 · answered Jun 30 '16 at 06:41

3

Mongodb 3.0 and higher has a new storage engine - WiredTiger. In my case switching engine reduced disk usage from 100 Gb to 25Gb.

answered Jun 30 '16 at 06:41

Hett

3,484
2
34
51

score 1 · Answer 11 · answered Aug 09 '13 at 03:05

1

In general compact is preferable to repairDatabase. But one advantage of repair over compact is you can issue repair to the whole cluster. compact you have to log into each shard, which is kind of annoying.

answered Aug 09 '13 at 03:05

user2077221

897
8
11

score 1 · Answer 12 · answered Sep 11 '15 at 08:05

1

When i had the same problem, i stoped my mongo server and started it again with command

mongod --repair

Before running repair operation you should check do you have enough free space on your HDD (min - is the size of your database)

answered Sep 11 '15 at 08:05

Alexander Makarov

115
1
8

score 1 · Answer 13 · answered Nov 28 '18 at 04:29

For standalone mode you could use compact or repair,

For sharded cluster or replica set, in my experience, after you running compact on the primary, followed by compact the secondary, the size of primary database reduced, but not the secondary. You might want to do resync member to reduce the size of secondary database. and by doing this you might find that the size of secondary database is even more reduced than the primary, i guess the compact command not really compacting the collection. So, i ended up switching the primary and secondary of the replica set and doing resync member again.

my conclusion is, the best way to reduce the size of sharded/replica set is by doing resync member, switch primary secondary, and resync again.

score 1 · Answer 14 · answered Sep 08 '11 at 09:45

Database files cannot be reduced in size. While "repairing" database, it is only possible for mongo server to delete some of its files. If large amount of data has been deleted, mongo server will "release" (delete), during repair, some of its existing files.

score 0 · Answer 15 · answered Sep 20 '18 at 06:28

mongoDB -repair is not recommended in case of sharded cluster.

If using replica set sharded cluster, use compact command, it will rewrites and defragments all data and index files of all collections. syntax:

db.runCommand( { compact : "collection_name" } )

when used with force:true, compact runs on primary of replica set. e.g. db.runCommand ( { command : "collection_name", force : true } )

Other points to consider: -It blocks the operations. so recommended to execute in maintenance window. -If replica sets running on different servers, needs to be execute on each member separately - In case of sharded cluster, compact needs to execute on each shard member separately. Cannot execute against mongos instance.

score -7 · Answer 16 · answered Feb 06 '15 at 22:02

-7

Just one way that I was able to do it. No guarantee on the safety of your existing data. Try with your own risk.

Delete the data files directly and restart mongod.

For example, with ubuntu (default path to data: /var/lib/mongodb), I had couple files with name like: collection.#. I keep the collection.0 and deleted all others.

Seems an easier way if you don't have serious data in database.

answered Feb 06 '15 at 22:02

frnkxiao

147
1
2

the files are stored as . e.g. mydb.3 - you can't tell the collection. – bobmarksie Jul 30 '15 at 10:01

Reducing MongoDB database file size

16 Answers16

Recover space in a standalone node

Recover space in a replica set

Regarding `repairDatabase`

Consequences of `repairDatabase` on a replica set

Linked

Related

Reducing MongoDB database file size

16 Answers16

Recover space in a standalone node

Recover space in a replica set

Regarding repairDatabase

Consequences of repairDatabase on a replica set

Linked

Related

Regarding `repairDatabase`

Consequences of `repairDatabase` on a replica set