1

I plan to do migration from PostgreSQL 9.3.4 to 9.5 on the same machine by bash. But one important issue is to make sure the two databases (9.3.4 and 9.5) are identical.

identical means nearly everything is the same, including administration data and user data, such as table, schema, data, users, and so on. After migration, postgresql-9.3.5 will be shutdown. only 9.5 will be used.

Checksum may help. But how to do it in bash? Are there any other ways or ideas? Thanks.

BAE
  • 8,550
  • 22
  • 88
  • 171
  • identical means nearly everything is the same, including administration data and user data, such as table, schema, data, users, and so on. After migration, postgresql-9.3.5 will be shutdown. only 9.5 will be used. – BAE Jan 20 '16 at 01:51

1 Answers1

3

Postgresql checksums are intended as a way to catch disk corruption. The system works by computing a checksum for each physical page in the database, and writing it with the page. When the page is later read back in, the checksum can be verified; if it doesn't match, the underlying file has been corrupted.

Between two different instances of the same data, the checksums would only match if the two files have precisely the same bytes in the same order, something which is unlikely if you are upgrading since the new version probably has internal tables with different layouts. Even were that not the case, there is a certain amount of non-determinism in the low-level data format, and it is extremely unlikely that two instances of the same database (even with the same version) would be byte-for-byte identical.

Your best bet is probably to dump all the data in some deterministic format, and sort it unambiguously. Then compute an md5sum on that (or some other hash if you prefer).

Also see this similar question: How to check difference between two databases in PostgreSQL?

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks, in this case, how can i make sure the migration is free of errors – BAE Jan 20 '16 at 05:21
  • @BAE: Well, it's always a risk :) If you have replication set up, you could take a snapshot of the database, and then keep one replica in the old version until you are certain that all is well with the migration. Between your regular backups, the replica and the snapshot, you would hopefully have enough to be able to restore anything you lost, more or less as it was within a reasonable amount of time. But I don't mean to make it sound too fraught. Postgresql upgrades "usually" go pretty smoothly. I'd still keep a replica in the old version handy for a couple of weeks, if you've got replicas. – rici Jan 20 '16 at 05:46
  • @BAE: Also, you'll get more expert advice on the subject on the [DBA stackexchange](http://dba.stackexchange.com/) – rici Jan 20 '16 at 06:07
  • thanks for your suggestions. The problem is that we do not have a replication cluster and we can not upgrade directly. That is why we need to do the migration. – BAE Jan 20 '16 at 15:27
  • @BAE: I understand, and I strongly recommend you ask a question on the DBA site (after having searched for existing questions.) As always, the more information you provide in the question (eg. your comment) and the more clear your requirements, the more likely you are to get a good answer. – rici Jan 20 '16 at 15:39
  • Also, avoid XY questions which assume a particular solution to an unstated problem. The body of your question here is much clearer than your title, for example: someone with knowledge about "verifying upgrades" could easily skip over a question talking only about checksums which does not even mention the underlying problem. – rici Jan 20 '16 at 15:46
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/101219/discussion-between-bae-and-rici). – BAE Jan 20 '16 at 20:07