CouchDB backups and cloning the database

Question

We're looking at CouchdDB for a CMS-ish application. What are some common patterns, best practices and workflow advice surrounding backing up our production database?

I'm particularly interested in the process of cloning the database for use in development and testing.

Is it sufficient to just copy the files on disk out from under a live running instance? Can you clone database data between two live running instances?

Advice and description of the techniques you use will be greatly appreciated.

score 37 · Answer 1 · answered Sep 23 '08 at 17:33

Another thing to be aware of is that you can copy files out from under a live database. Given that you may have a possibly large database, you could just copy it OOB from your test/production machine to another machine.

Depending on the write load of the machines it may be advisable to trigger a replication after the copy to gather any writes that were in progress when the file was copied. But replication of a few records would still be quicker than replication the entire database.

For reference see: http://wiki.apache.org/couchdb/FilesystemBackups

"you can copy files out from under a live database" - This is excellent advice, I was looking to duplicate a database and found I can duplicate and rename a .couch file in Finder to accomplish this. — DigitalDesignDj, Jan 15 '13 at 21:42

Marc Gear · Accepted Answer · 2021-07-27T08:39:40.003

CouchDB supports replication, so just replicate to another instance of CouchDB and backup from there, avoiding disturbing where you write changes to.

https://docs.couchdb.org/en/latest/maintenance/backups.html

You literally send a POST request to your CouchDB instance telling it where to replicate to, and it Works(tm)

EDIT: You can just cp out the .couch files in the data directory from under the running database as long as you can accept the I/O hit.

score 7 · Answer 3 · edited Mar 03 '16 at 22:09

7

I'd like to second Paul's suggestion: Just cp your database files from under the live server if you can take the I/O-load hit. If you run a replicated copy anyway, you can safely copy from that too, without impacting your master's performance.

edited Mar 03 '16 at 22:09

Peter Mortensen

30,738
21
105
131

answered Sep 24 '08 at 11:18

Jan Lehnardt

393
2
6

score 7 · Answer 4 · edited Mar 03 '16 at 22:11

CouchDB also works very nicely with filesystem snapshots offered by modern filesystems like ZFS. Since the database file always is in a consistent state you can take the snapshot of the file any time without weakening the integrity guarantees provided by CouchDB.

This results in nearly no I/O overhead. In case you have e.g. accidentally deleted a document from the database you can move the snapshot to another machine and extract the missing data there. You might even be able to replicate back to the production database, but I never have tried that.

But always make sure you use exactly the same couchdb revisions when moving around database files. The on-disk format is still evolving in incompatible ways.

score 1 · Answer 5 · edited Mar 03 '16 at 22:15

1

CouchDB replication is horrible. I generally do tar which is much better.

Stop the CouchDB service on the source host
tar.gz the data files.
On my Ubuntu servers this is typically in /var/lib/couchdb (sometimes in a subdirectory based on the Couch version). If you aren’t sure where these files are, you can find the path in your CouchDb config files, or often by doing a ps -A w to see the full command that started CouchDb. Make sure you get the subdirectories that start with . when you archive the files.
Restart the couchdb service on the source host.
scp the tar.gz file to the destination host and unpack them in a temporary location there.
chown the files to the user and group that owns the files already in the database directory on the destination. This is likely couchdb:couchdb. This is important, as messing up the file permissions is the only way I’ve managed to mess up this process so far.
Stop CouchDB on the destination host.
cp the files into the destination directory. Again on my hosts this has been /var/lib/couchdb.
Double check the file permissions in their new home.
Restart CouchDB on the destination host.

edited Mar 03 '16 at 22:15

Peter Mortensen

30,738
21
105
131

answered Feb 26 '15 at 04:02

coffeequant

425
3
6
19

6

Replication is just about the only thing that CouchDB is _really_ good at - that was the whole point behind its revision-based document design. I would seriously question why you're using it if you're not replicating. Also, you don't need to stop CouchDB to copy the files (ref: http://wiki.apache.org/couchdb/FilesystemBackups) – slang Apr 09 '15 at 01:09
1

Haha, no - I'm not a CouchDB dev - I just use it in some internal analytics systems at VICE. And 20GB shouldn't be a problem - if you got a crash, I'd report that to Apache as a bug. – slang Apr 10 '15 at 14:09
1

I think there are some cases when this is a valid answer, for example when you do a fresh install of something that uses CouchDB, or with new replication nodes of very big databases when HTTP requests are an unnecessary overload. There is no silver bullet. By the way, on Centos 6.6 the databases files are in `/usr/local/var/lib/couchdb`. – evalarezo Jun 02 '15 at 17:04

score 0 · Answer 6 · answered May 28 '21 at 15:08

I do it via powershell and the PSCouchDB module with the command Export-CouchDBDatabase.

This exports an entire database to a json file, which you can re-import via the import command (see the link).

ex.

Export-CouchDBDatabase -Database test -Authorization "admin:password"

this export a json file in a current directory: test_05-28-2021_17_01_00.json

CouchDB backups and cloning the database

6 Answers6

Linked