38

We're looking at CouchdDB for a CMS-ish application. What are some common patterns, best practices and workflow advice surrounding backing up our production database?

I'm particularly interested in the process of cloning the database for use in development and testing.

Is it sufficient to just copy the files on disk out from under a live running instance? Can you clone database data between two live running instances?

Advice and description of the techniques you use will be greatly appreciated.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Kyle Burton
  • 26,788
  • 9
  • 50
  • 60

6 Answers6

37

Another thing to be aware of is that you can copy files out from under a live database. Given that you may have a possibly large database, you could just copy it OOB from your test/production machine to another machine.

Depending on the write load of the machines it may be advisable to trigger a replication after the copy to gather any writes that were in progress when the file was copied. But replication of a few records would still be quicker than replication the entire database.

For reference see: http://wiki.apache.org/couchdb/FilesystemBackups

Paul J. Davis
  • 1,709
  • 15
  • 9
  • 1
    "you can copy files out from under a live database" - This is excellent advice, I was looking to duplicate a database and found I can duplicate and rename a .couch file in Finder to accomplish this. – DigitalDesignDj Jan 15 '13 at 21:42
32

CouchDB supports replication, so just replicate to another instance of CouchDB and backup from there, avoiding disturbing where you write changes to.

https://docs.couchdb.org/en/latest/maintenance/backups.html

You literally send a POST request to your CouchDB instance telling it where to replicate to, and it Works(tm)

EDIT: You can just cp out the .couch files in the data directory from under the running database as long as you can accept the I/O hit.

Marc Gear
  • 2,757
  • 1
  • 20
  • 19
7

I'd like to second Paul's suggestion: Just cp your database files from under the live server if you can take the I/O-load hit. If you run a replicated copy anyway, you can safely copy from that too, without impacting your master's performance.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jan Lehnardt
  • 393
  • 2
  • 6
7

CouchDB also works very nicely with filesystem snapshots offered by modern filesystems like ZFS. Since the database file always is in a consistent state you can take the snapshot of the file any time without weakening the integrity guarantees provided by CouchDB.

This results in nearly no I/O overhead. In case you have e.g. accidentally deleted a document from the database you can move the snapshot to another machine and extract the missing data there. You might even be able to replicate back to the production database, but I never have tried that.

But always make sure you use exactly the same couchdb revisions when moving around database files. The on-disk format is still evolving in incompatible ways.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
max
  • 29,122
  • 12
  • 52
  • 79
1

CouchDB replication is horrible. I generally do tar which is much better.

  1. Stop the CouchDB service on the source host
  2. tar.gz the data files.
  3. On my Ubuntu servers this is typically in /var/lib/couchdb (sometimes in a subdirectory based on the Couch version). If you aren’t sure where these files are, you can find the path in your CouchDb config files, or often by doing a ps -A w to see the full command that started CouchDb. Make sure you get the subdirectories that start with . when you archive the files.
  4. Restart the couchdb service on the source host.
  5. scp the tar.gz file to the destination host and unpack them in a temporary location there.
  6. chown the files to the user and group that owns the files already in the database directory on the destination. This is likely couchdb:couchdb. This is important, as messing up the file permissions is the only way I’ve managed to mess up this process so far.
  7. Stop CouchDB on the destination host.
  8. cp the files into the destination directory. Again on my hosts this has been /var/lib/couchdb.
  9. Double check the file permissions in their new home.
  10. Restart CouchDB on the destination host.
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
coffeequant
  • 425
  • 3
  • 6
  • 19
  • 6
    Replication is just about the only thing that CouchDB is _really_ good at - that was the whole point behind its revision-based document design. I would seriously question why you're using it if you're not replicating. Also, you don't need to stop CouchDB to copy the files (ref: http://wiki.apache.org/couchdb/FilesystemBackups) – slang Apr 09 '15 at 01:09
  • 1
    Haha, no - I'm not a CouchDB dev - I just use it in some internal analytics systems at VICE. And 20GB shouldn't be a problem - if you got a crash, I'd report that to Apache as a bug. – slang Apr 10 '15 at 14:09
  • 1
    I think there are some cases when this is a valid answer, for example when you do a fresh install of something that uses CouchDB, or with new replication nodes of very big databases when HTTP requests are an unnecessary overload. There is no silver bullet. By the way, on Centos 6.6 the databases files are in `/usr/local/var/lib/couchdb`. – evalarezo Jun 02 '15 at 17:04
0

I do it via powershell and the PSCouchDB module with the command Export-CouchDBDatabase.

This exports an entire database to a json file, which you can re-import via the import command (see the link).

ex.

Export-CouchDBDatabase -Database test -Authorization "admin:password"

this export a json file in a current directory: test_05-28-2021_17_01_00.json