-1

I am writing a backup script for a sqlite database that changes very intermittently. Here's how it is now:

from bz2 import BZ2File
from datetime import datetime
from os.path import dirname, abspath, join
from hashlib import sha512
def backup_target_database(target):
    backup_dir = dirname(abspath(target))
    hash_file = join(backup_dir, 'last_hash')
    new_hash = sha512(open(target, 'rb').read()).digest()
    if new_hash != open(hash_file, 'rb').read():
        fmt = '%Y%m%d-%H%M.sqlite3.bz2'
        snapshot_file = join(backup_dir, datetime.now().strftime(fmt))
        BZ2File(snapshot_file, 'wb').write(open(target, 'rb').read())
        open(hash_file, 'wb').write(new_hash)

Currently the database weighs just shy of 20MB, so it's not that taxing when this runs and reads the whole file into memory (and do it twice when changes are detected), but I don't want to wait until this becomes a problem.

What is the proper way to do this sort of (to use Bashscript terminology) stream piping?

Liz Av
  • 2,864
  • 1
  • 25
  • 35
  • related to the title: [Redirect stdout to a file in Python?](http://stackoverflow.com/a/22434262/4279). You should change it to: "backup sqlite database in Python". See [How to backup sqlite database?](http://stackoverflow.com/q/25675314/4279) – jfs Jul 15 '15 at 16:20
  • Your task is unrelated to "stream piping" in any way. To avoid memory error, it is enough to read in chunks as [@devunt's answer shows](http://stackoverflow.com/a/31434471/4279) – jfs Jul 15 '15 at 16:28
  • @J.F.Sebastian 1. I plan to eventually use this for other things besides SQLite databases. Because the database is known to change far less often than backup runs (hence the hash check), I don't care about concurrency. 2. "Stream piping" is the way to do this in Bashscript; if that analogy isn't useful for you, it's useful for other people. – Liz Av Jul 15 '15 at 17:16
  • nothing in the algorithm suggests "stream piping". If you disagree, please, provide a bash command that backups sqlite database using "stream piping" (if you can't then any command that illustrates your understanding of the term "stream piping" will do). Here's what usually the word [pipe mean](http://wiki.bash-hackers.org/howto/redirection_tutorial#pipes) – jfs Jul 15 '15 at 17:25
  • In mysql I did `mysqldump $DATABASE | bzip2 > $BACKUP` ...do you have anything helpful to say? – Liz Av Jul 15 '15 at 17:35
  • yes, `mysqldump` pipeline is completely unrelated to "compute sha512; copy file" case. – jfs Jul 15 '15 at 17:39

1 Answers1

2

First, there's a duplication in your code (reading target file twice).

And you can use shutil.copyfileobj and hashlib.update for memory-efficient routine.

from bz2 import BZ2File
from datetime import datetime
from hashlib import sha512
from os.path import dirname, abspath, join
from shutil import copyfileobj

def backup_target_database(target_path):
    backup_dir = dirname(abspath(target_path))
    hash_path = join(backup_dir, 'last_hash')
    old_hash = open(hash_path, 'rb').read()
    hasher = sha512()
    with open(target_path, 'rb') as target:
        while True:
            data = target.read(1024)
            if not data:
                break
            hasher.update(data)
        new_hash = hasher.digest()
    if new_hash != old_hash:
        fmt = '%Y%m%d-%H%M.sqlite3.bz2'
        snapshot_path = join(backup_dir, datetime.now().strftime(fmt))
        with open(target_path, 'rb') as target:
            with BZ2File(snapshot_path, 'wb', compresslevel=9) as snapshot:
                copyfileobj(target, snapshot)

(Note: I didn't test this code. If you have problem please notice me)

devunt
  • 347
  • 3
  • 9
  • 1
    you could move hash-computing code into a separate function, [example](http://stackoverflow.com/a/7829658/4279) – jfs Jul 15 '15 at 16:31
  • 1
    it is not safe, to copy sqlite database using `copyfileobj()` if it is being written to, see [How to backup sqlite database?](http://stackoverflow.com/q/25675314/4279) – jfs Jul 15 '15 at 16:32
  • Worked as a charm, thank you! See the result at https://github.com/ekevoo/hfbr/blob/master/hfbrw.py#L36 – Liz Av Jul 17 '15 at 18:40