2

I have a couple of questions regarding backup of remote database of my TokuMX server running in production (there is no sharding and replication). The single clause is don't stop running Tokumx instance.

  1. What's the best way to make hot backup of running TokuMX server (except TokuMX Hot Backup in enterprise version).

  2. The question regarding of suggested backup approach of MongoDB:

    [backup-host]# mongodump --host mongodb-host --port 27017 --db mongodevdb --username mongouser --password mongopwd
    
    • Is this command prefer way to make hot backups?
    • What port should I use when issue this command?
    • Is it good approach to use this command by cron and run it every day?
    • Is there any pitfalls in this command?
Stennie
  • 63,885
  • 14
  • 149
  • 175
Erik
  • 14,060
  • 49
  • 132
  • 218

1 Answers1

2

Disclaimer: I work at Tokutek, I'm an engineer working on TokuMX.

There is no "best" way to make a backup of TokuMX, each application is different and it's best to understand all the options and make your own decision.

The backup options for TokuMX are these:

  1. Enterprise hot backup.
  2. Filesystem-level snapshot (LVM, EBS, xfs_freeze) to copy out everything in the dbpath and logDir.
  3. Using mongodump.

Please note that fsyncLock does not work, as background threads will still write to the filesystem even if client threads aren't doing anything. Using fsyncLock only can give you a corrupt backup.

Filesystem snapshots and enterprise hot backup both have the advantage that you're copying serialized, compressed data, so you're avoiding the cost of querying all the collections and transferring uncompressed BSON data over the wire. Additionally, those options won't destroy the information in the cachetable about what data is most important, whereas mongodump will cause everything to be paged in, possibly evicting data that's useful for your application.

Enterprise hot backup has the additional advantages over filesystem-level snapshots that it is less expensive (you don't need to reserve extra space like you would for a snapshot), it can be throttled to meet I/O quotas, and the resulting state of the backup is the state at the time when the backup completes, rather than when it starts. So if it takes 12 hours to copy data out for the backup, a filesystem-level snapshotted backup will be 12 hours behind the equivalent backup taken with the hot backup plugin.

For simple uses, mongodump may be the best option, if you aren't concerned about performance, cache invalidation, network bandwidth, or recency. It is also the only option that supports backing up a single database or collection.

For mongodump, its usage is the same as for MongoDB. You need to use the host and port on which your server is running, the default is 27017. If it's the default you don't need to specify any --port option.

You can definitely run it every day with cron, I suggest something like this:

SHELL=/bin/bash
0 0 * * * /usr/bin/mongodump --host <host> -o "/var/lib/backup/tokumx-backup-$(date +%Y%m%d)"

The main pitfalls of mongodump are just that it is more expensive and it destroys the information in the cachetable that says what data is important. It also won't get a perfectly consistent snapshot across multiple collections like hot backup and filesystem-level snapshot backups will. A mongodump may contain the effects of some writes in one collection and not contain the effects of earlier writes in a different collection.

You'll also want to define a scheme for expiring old backups, I expect.

leif
  • 1,987
  • 4
  • 19
  • 22
  • Thanks for the reply. You sad "Please note that fsyncLock does not work, as background threads will still write to the filesystem even if client threads aren't doing anything. Using fsyncLock only can give you a corrupt backup". But in the following post they suggest to use fsyncLock http://www.codeproject.com/Tips/547759/Automating-backup-for-MongoDB-using-CRON-and-S-CMD – Erik Mar 24 '14 at 17:26
  • And what do you think about this snippet https://github.com/micahwedemeyer/automongobackup/blob/master/src/automongobackup.sh ? – Erik Mar 24 '14 at 17:31
  • I said fsyncLock does not work specifically because that is a very important difference between MongoDB and TokuMX. That link does not apply to TokuMX. I have not read that snippet carefully but if it calls mongodump in a sane way it's probably alright. – leif Mar 25 '14 at 05:42
  • 1
    Thanks for the response again. So I see the single aproach is to stop tokumx servive, make mongodump and start again. And the last question is can I use Hot Tokumx Backup free for non commercical project? – Erik Mar 25 '14 at 11:32
  • You can run mongodump on a running server, you don't need to take it down. Hot backup is enterprise only, you need a license beyond the 30 day evaluation period. – leif Mar 25 '14 at 18:29
  • But you said if I run mongodump on running server I may have won't get a perfectly consistent snapshot across multiple collections? – Erik Mar 25 '14 at 18:38
  • That's true. So yes, if you want that, stop the server. – leif Mar 25 '14 at 18:51
  • Is fsyncLock in the tokumx roadmap? It seems like it is the best approach to use when differential snapshots are supported, for example in GCE. Without having to stop the server of course. – odedfos Oct 20 '14 at 08:56