How can I incremental backup a SQLite database?

Question

I want want to backup the database at high frequency, but the cost of full backup is not acceptable. It seems SQLite do not have API to do incremental backup directly. But I found there's a Data Change Notification Callbacks which seems helpful. https://www.sqlite.org/c3ref/update_hook.html The parameters of the callbacks are type of operation, name of datebase, name of table and rowID. However, I don't know is it possible to generate backup information (An SQL statement, for example)for this row of date by these information, without knowing the detailed information of table. Which means, is there a common method to generate the backup for this operation for tables with different structure? I know there are some exceptions that the callback won't be called, and I think this is acceptable if I do full backup periodically.

Possible duplicate of [How can i create in incremental backups of mysql databases](http://stackoverflow.com/questions/17637809/how-can-i-create-in-incremental-backups-of-mysql-databases) — Paul Sweatte, Oct 03 '15 at 02:33
Possible duplicate of [How do I do incremental backups for SQLite?](http://stackoverflow.com/questions/5749130/how-do-i-do-incremental-backups-for-sqlite/) — Bernardo Ramos, Jan 15 '17 at 19:40
Possible duplicate of [Is there a way to update SQLITE database using deltas?](http://stackoverflow.com/questions/11754604/is-there-a-way-to-update-sqlite-database-using-deltas) — Paul Sweatte, Jan 21 '17 at 01:08

score 6 · Answer 1 · answered Mar 06 '20 at 07:05

I’ve been thinking about this idea, it’s not officially endorsed by SQLite but in theory it sounds very plausible.

The SQLite backup api essentially lets you get a working snapshot of the live database file. Similarly the VACCUM INTO let’s you update an existing backup database file. https://www.sqlite.org/lang_vacuum.html#vacuuminto

This is just good ol’ backup, we want an incremental backup (kinda like git)

Suppose we want to backup the database every hour and it’s 1GB database with relatively few writes, storing 24GB per day sounds like an overkill.

We can take advantage of SQLite file format which is essentially a fixed 100 byte header + (page_size * num_pages). SQLite will always write around the page boundaries. The page_size and num_pages are stored in the 100 byte header. See the storage spec https://www.sqlite.org/fileformat.html

So what we can do is create a reference file which is simply a file with a list of hashes. Say we use sha256 (which is what new git version uses) so it will be a file (e.g backups/2020-02-22-19-12-00.txt)

sha256(header)
sha256(page1)
sha256(page2)
sha256(page3)

And we store the corresponding pages as individual files like git does in an objects dir.

E.g objects/ab/cdef12343..

The first two letters are used as directory name so we don’t have too many files in a directory.

Alternatively you can just upload the page files to any of the cloud storage providers e.g GCS, S3, Azure Blobs, DO spaces. This could give multi region backup.

Since we’re not storing duplicate copies of pages, the total file size of all backups is pretty small compared to the database_size * num_backups.

You can even use the hashes file as a way to sync/restore SQLite file. This is how Dropbox/rsync syncs files. The hashes file tells us what pages changed and we only download the changed objects and update those ranges in the file.

This approach looks interesting. Does anyone think there is a way to make it work while the database is in use? Because with the above solution will not work if the pages change while we are hashing the pages. — Davi Doro, Nov 18 '21 at 18:00

Nokib Sorkar · Answer 2 · 2021-11-03T18:02:28.043

1

I have Created a nodejs package sqlite3-incremental-backup based on the approach by @user3893988. You can use it as a module as follows:

import {backup, restore} from './sqlite3-incremental-backup';

const srcFile = 'source.db';
const targetFile = 'target.db';
const snapshotName = 'snapshot1.txt'; //Can be any arbitrary name. MUST BE UNIQUE FOR EACH SNAPSHOT OTHERWISE THE PREVIOUS WILL BE LOST
backup(srcFile, snapshotName); // For Backup
restore(snapshotName, targetFile); // For Restoration

The python module is coming soon.

edited Nov 03 '21 at 18:02

answered Sep 16 '21 at 22:00

Nokib Sorkar

31
9

Are you aware that just copying an SQLite database file that is currently in use (productive) can lead to a corrupt database file as described here: https://sqlite.org/forum/forumpost/58a4ae848fad4a3b – Toxiro Sep 22 '22 at 11:58
Also that is one reason why `VACUUM INTO` and the `backup API` has been implemented. Just a simple file copy of a SQLite database cannot be trusted. – Toxiro Sep 22 '22 at 12:00
Thinking about that again, you could probably combine both solutions, first make a full backup locally then diff with the last backup and only store diffs remotely. This could be done with diff tools even on a sub page level. For huge SQLite databases you can use the backup API to backup the database in multiple steps while letting the users access the database in between. – Toxiro Sep 22 '22 at 12:20
But then you might just use https://litestream.io and have a live replica that you can use to restore at any point in time. – Toxiro Sep 24 '22 at 07:23

How can I incremental backup a SQLite database?

2 Answers2