10

I am using svnX (0.9.13) on Mac OS X Lion (10.7.2 11C74) and have seem to have, what I believe, is a corrupted SVN repository. I have searched the site for similar questions and have found a couple, yet none describe how to recover when you cannot complete a checkout from the repository. I do not have an up to date working directory either.

The specific error is:

svn: Checksum mismatch while reading representation:
expected: [hash]
actual: [different hash]

If the alert is dismissed (the only option), the checkout will continue until the end. On first glance, most of the files seem to be there, but when I run the application, it is clear there is a mishmash of versions. The repository lives on a USB flash drive, which could be a source of corruption. I am the only user who access these files and they have not been touched for over a week and were in a working state.

Any suggestions on how to proceed would be appreciated.

Community
  • 1
  • 1
Noren
  • 793
  • 3
  • 10
  • 23
  • Restore from backup? SVN stores revisions in an efficient binary-diff format, I suspect that'd make it very susceptible to corruption... In any case, the first thing you should do is make a complete copy of the repository, and work on recovering the *copy*. – derobert Jan 12 '12 at 20:28
  • USB flash drive? I hope you back it up regularly (it might even get lost if not corrupted). – crashmstr Jan 12 '12 at 20:33
  • @derobert Restoration from backup is possible, but a few commits will be lost, so I'm trying to do that only as a last resort. – Noren Jan 12 '12 at 20:38
  • @crashmstr Yes, the drive is backed up along with the development computer weekly, however, I worked a bit during the holiday break when I couldn't connect to the backup hardware. I guess this is my penance. – Noren Jan 12 '12 at 20:43
  • @Noren: If you have checked out copies, you may lose some history, but you won't lose the most-recent revisions (you can just immediately check in your most-recent copies). Also, when I google that 'svn: Checksum mismatch…' error message, I get several discussions on fixing it—have you tried any of that? What does svnadmin verify say? – derobert Jan 12 '12 at 21:25
  • Your only hope is to do a `svnadmin` [dump and load](http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.migrate.svnadmin) and see if that helps take care of the corruption issue. – David W. Jan 13 '12 at 02:23
  • @DavidW. I just tried 'svnadmin dump' and it stopped well short of the current revision with the error "corrupt node revision" and "missing id field in node-rev". – Noren Jan 13 '12 at 15:36
  • @derobert Yes, I tried [two](http://andrew.hedges.name/blog/2009/01/25/how-to-recover-from-checksum-mismatch-errors-in-svn) [possible](http://glob.bushi.net.nz/glob/2007/02/14/subversion-checksum-mismatch-easy-workaround/) solutions before posting. I think the best solution now is to create a new repository with my current revision and lose the history. – Noren Jan 13 '12 at 16:02
  • 1
    @Noren - In svnadmin `dump`, you can specify a revision range via the `-r` parameter. Try skipping the bad revision and see if that helps. If you have a backup, you might be able to do a dump from the backup, then start the dump of your current one to the revision after your backup. Then, combine the two and see what happens. There's not much else you can do. It's like what happens if your hard drive crashes. You try to save what you can and hope for the best. – David W. Jan 13 '12 at 16:02
  • @DavidW. Using several `svnadmin dump` ranges, I was able to preserve some history and the most current revision. Please make an answer post so I can close the thread. Thanks! – Noren Jan 13 '12 at 16:22

2 Answers2

18

When you have a corrupt repository, your only real chance in saving the information is to do a dump and load. If you're lucky, doing a dump and load will sometimes correct the corruption.

If not, you can use the -r <from>:<to> parameter on the dump to skip over the bad revisions. You can create several dump files and merge them into a single repository, so you can skip over the bad revision numbers. I've noticed that each dump file starts with a complete revision of the repository at that revision, and the dump/load process is usually smart enough not to double up changes.

In fact, I believe you can even put several dumps into a single dump file without too many problems. The following should skip over revisions 1001 and 1204 which are bad revisions:

$ svnadmin dump -r1:1000 my_repos > dumpfile.txt
$ svnadmin dump --incremental -r1002:1203 my_repos >> dumpfile.txt
$ svnadmin dump --incremental -r1205:HEAD my_repos >> dumpfile.txt
$ svnadmin load my_repos2 < dumpfile.txt

There are several Subversion backup scripts that backup the repository by taking dumps of the newest revisions. For example, the first time you run it, it dumps everything from the first revision to the last version (say revision 1000). Then, the next day it dumps revision 1001 to the last revision (say 1003), and the next day, revision 1004 to the last revision.

To restore, you have to restore all the dumps, but the backup times are suppose to be shorter than doing a full dump each time.

You can also do a hotcopy, but I don't find doing a hotcopy that much faster than doing a dump, and there could be issues if you have to move your repository to a different machine.

Albin Sunnanbo
  • 46,430
  • 8
  • 69
  • 108
David W.
  • 105,218
  • 39
  • 216
  • 337
  • 1
    The above way worked for me, with one exception: I had to use the "--incremental" switch for svnadmin to make sure that the paths are correct. – eazy Nov 08 '12 at 15:44
  • You sir, just saved me :-). I just had to dump revisions one by one because HEAD seemed to be unrecognized or at least dump produced weired errors. – Nux Sep 05 '13 at 20:56
  • 1
    sometimes properties are corrupted, then use --bypas-prop-validation option in svnadmin load command . – Znik Feb 27 '14 at 09:11
  • 1
    and if I get an error during load: "* editing path ... svnadmin: E160013: File not found: transaction '2-2', path ..."? my problem is with some earlier revision (the 3 and 5, now I am at 65)... – unlikely Jan 31 '16 at 08:18
6

You should do a dump and load as David W. suggested. However there are some gotchas that I encountered and I would like to post a complete solution.

Corruption typically occurs in single files on some revisions. We don't need to discard an entire revision just because some file had a checksum mismatch.

First we will try disabling checksum calculation, by removing lines matching Text-content-md5

svnadmin dump my_repo | sed '/^Text-content-md5/d' | svnadmin load second_repo

The incremental approach enables us to fix errors and continue our progress. If an error happens during the dump and load, look for the last --- Committed revision X >>> --- message and put X+1 as starting revision as parameter -r and try again. This saves considerable time.

svnadmin dump --incremental -r1:100000 my_repo | sed '/^Text-content-md5/d' | svnadmin load second_repo

Or just load from the dumpfile:

sed '/^Text-content-md5/d' dumpfile.txt | svnadmin load second_repo

If that was not enough, and you're getting 'Premature end of content data in dumpstream' error or something similar, you should exclude that file completely from the dump by svndumpfilter:

svnadmin dump --incremental -r1:100000 my_repo | svndumpfilter exclude myproject/lib/thirdparty-all.jar | sed '/^Text-content-md5/d' | svnadmin load second_repo

The command above excludes myproject/lib/thirdparty-all.jar file from the dump.

Extra information:

  • You could append --bypass-prop-validation to svnadmin load command. This works if the corruption is minor.
  • Fix Dump stream contains a malformed header (with no ':') error with appending
    | grep --binary-files=text -v '^* Dumped revision'
    to the pipe chain (before svnadmin load).

Hope this post is useful to some people.

bekce
  • 3,782
  • 29
  • 30