5

I have used the BFG Repo-Cleaner to remove a large file from a git repository:

java -jar ../bfg-1.11.8.jar --delete-folders escrow application.git
cd application.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive
cd ..
mkdir clone
cd clone
git clone file:///home/damian/temp/TCLIPG-4370/test/application.git

I have used the script(http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/) to check my repository before and after running BFG Repo-Cleaner and it shows the removal of the escrow directory and there is also a reduction in memory in the two repositories.

Everything looks ok, but how can I verify that all my commits are the same? Would I have to create a script with git-for-each-ref and compare the commits, with the same name, in the two repositories, to verify that BFG has worked correctly?

Any suggestions would be greatly appreciated.

3 Answers3

3

You could get an independent opinion from Eric S. Raymond's repodiffer (part of his reposurgeon project): http://www.catb.org/~esr/reposurgeon/repodiffer.html

You use it like this:

$ repodiffer old-repo-copy.git new-repo-copy.git

The script may take a while to run, but it will tell you precisely what has changed between those two repos. Small sample of output:

...
1a54b66 -> 9b11d44: same differences as for 5c572dc -> 6e8307c.
changed: e00a601 -> 30a42c8 in tree.
L only:
  frontend/assets/big.mp4
R only:
  frontend/assets/big.mp4.REMOVED.git-id
...

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
2

Quick and dirty technique - presuming there was only 1 version of the large file that ever existed

This will print out the blob sha for the large file

 git hash-object <large-file>

Using the sha from the previous step

git cat-file -p <large-file-sha>

If that fails, then you know no commit can be referencing that blob.

If you really want to verify all your commits are the same (and the same means 'different' here, since you are removing the large file), then you would need to write a script to diff-tree the original commits and the new commits. You wouldn't use for-each-ref, you'd use rev-list, and you'd need a mechanism to map old sha to new sha, which you might not have with the BFG tool. You could just verify the branch tips as you describe though, which might be good enough.

Andrew C
  • 13,845
  • 6
  • 50
  • 57
  • 1
    The BFG produces an object-id mapping file (old new), which might be helpful :https://github.com/rtyley/bfg-repo-cleaner/issues/18#issuecomment-17973838 – Roberto Tyley Sep 17 '14 at 06:27
0

What I was able to do is do a normal clone, apply the bfg command I need, then have a look at the files with a graphical git tool (sublime merge is great!).

Then, once I'm satisfied, apply the same changes on a mirror clone, then push them, since git won't allow you to push the unreferenced commit objects from the normal clone. Explanation here: Is there a way in git to push the reflog?

Devyzr
  • 299
  • 5
  • 13