15

I'm unsure as to whether this question should be posted in the Database Administrators' section or here, so please advise if I got it wrong.

I have a Django-based website which doesn't change much. I use python manage.py dumpdata --all --indent=2 > backup.json and reload the data with loaddata if I need to redeploy or the db gets corrupted. (I'm aware about integrity errors that have occurred when not excluding auth and content_types)

Since I'm using PostgreSQL on the backend, is it "best practise" or "wiser" for me to use pg_dump instead, and then pg_restore if something goes wrong or if I need to redeploy?

So dumpdata dumps all data associated with the selected apps (and/or models), and pg_dump performs a full dump of the db. Is this the same thing or is there a fundamental difference that I've missed (mind you I have 0 experience with DBA)?

Which option do I go for and why?

Community
  • 1
  • 1
Keenan Lawrence
  • 1,446
  • 10
  • 16

1 Answers1

13

It is both best practice and wiser for you to use pg_dump instead of dumpdata.

There are many reasons for this.

  • pg_dump is faster and the output is more compact (particularly with the -Fc option) than with dumpdata.

  • Importing the data back into the db with pg_restore will also be faster than django's loaddata.

  • pg_restore is available on any postgresql installation but django and it's dependencies you will have to install.

  • Last but not least the integrity errors that you spoke of will not happen with pg_dump/pg_restore.

Generally pg_dump is used to dump the entire database however the -t option allows you to dump one or few tables at a time

e4c5
  • 52,766
  • 11
  • 101
  • 134
  • Noted. I've already written a script to backup using pg_dump executing hourly. Since my DB is really tiny (~30MB), I had not noticed a major performance difference. Thank you for your answer. – Keenan Lawrence Jun 30 '16 at 10:49
  • 2
    You are welcome, if you are dumping regularly, you might want to consider using WAL archiving. https://www.postgresql.org/docs/9.2/static/continuous-archiving.html – e4c5 Jun 30 '16 at 10:51
  • Thanks for the link. I did not know that PostgreSQL had a continuous backup system. At the moment, I use `pg_dump` and then I commit and push the changes to a private GitHub repo, ensuring incremental changes. I also have the ability to restore data at an point in time, with a worst case scenario of an hour's loss of work. Should I still consider using WAL? Is it worth the effort (time-wise) to implement WAL? It looks a bit complex in comparison to `pg_dump` and `git`. Any advice is appreciated. – Keenan Lawrence Jun 30 '16 at 11:08
  • yes, it's a bit complex to setup initially in comparision to pg_dump + git but the advantage is that it does not have any impact at all on the performance (dumpdata > pg_dump > WAL). The other point is that you are adding about 600MB each day to your git repo so your bills would keep adding up. – e4c5 Jun 30 '16 at 11:26
  • True, scalability needs to be taken into account, so performance matters. Thanks for the advice (P.S The Git commits only commit the deltas, so it's more on the order of ~1MB per day. But valid point, I'll take it into account) – Keenan Lawrence Jun 30 '16 at 11:35
  • only the deltas for text isn't it? I think for binaries it's the whole file. – e4c5 Jun 30 '16 at 12:10
  • Thanks for pointing that out. So after a day of running the script hourly, my hosting company reported that I've used 140MB upload, including site traffic. I'm a bit unsure about this situation now. Is it perhaps due to the fact that when I inspect the `pg_dump` file, everything is in plain text? (It's a bunch of sql statements and values, but all human readable) – Keenan Lawrence Jul 01 '16 at 12:25
  • Yes, but the diff is still going to be rather large because for every row that is updated, it will store the old record prefixed with a - and the new record prefixed with a + in git. – e4c5 Jul 01 '16 at 12:52
  • Ah, I see! Thank you for that, and for your patience. I'll implement WAL archiving and give it a go. Thanks for the advice – Keenan Lawrence Jul 01 '16 at 12:53