3

I import SVN repository using Subgit, which is an excellent tool doing it fast and supporting custom svn layout. Subgit saves git commit -> svn revision reference in git notes. Every commit has revision number in notes, you can see it with git log.

After SVN->git import I use BFG repo cleaner to clean old project repository from binaries like jars, dlls etc. BFG does not rewrite links between git notes and changed commits but fortunately it leaves object-id-map.old-new.txt file.

I use this file to copy notes from old commits to new ones:

cat object-id-map.old-new.txt | git notes copy --stdin

After copying notes i remove them from old objects:

cat object-id-map.old-new.txt | cut -d' ' -f 1 | git notes remove --stdin --ignore-missing

The problem is that after fixing git notes repository size becomes 2 times bigger (even if i clone without --bare). Why?

Example: I have imported repo from svn with Subgit and have 400Mb .git. Then i apply BFG and get 40 Mb bare repository. I want to restore git notes by moving (copying and removing) them with 2 commands above, but unfortunately repo's size grows from 40 Mb to 80 Mb. I try to execute git notes prune and git reflog expire --expire=now --all && git gc --prune=now --aggressive which is recommended by BFG, but still have 80 Mb.

UPD: can't reproduce 40 Mb repo now:/ It is 80 after BFG cleanup and 86 after copying notes

Kirill
  • 6,762
  • 4
  • 51
  • 81
  • Probably it is notes history: `git log refs/notes/commits`. – user4003407 May 05 '17 at 17:15
  • 1
    Could you try these 2 commands one after another? `git -c gc.autoDetach=0 -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc --prune --aggressive` and `git -c gc.autoDetach=0 -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now prune` – Dmitry Pavlenko May 06 '17 at 10:33

1 Answers1

1

Three evolution in Git (since 2017) and tools around Git should help with this issue:

  • One: a tool like github/git-sizer will give you an idea of what is taking so much space.
  • Two: git filter-repo (that I mentioned here) now replaces BFG or gilter-branch. Install it first. (python3 -m pip install --user git-filter-repo).
    It will leaves less data after cleaning jars/binaries you do not want.
git filter-repo --strip-blobs-bigger-than 10M
  • Three: objects that lost references can be pruned away, even when they have notes attached to it (and these notes will become dangling, which in turn can be pruned with "git notes prune"(man)).
    This has been clarified in the documentation with Git 2.31 (Q1 2021), .

See commit fa9ab02 (10 Feb 2021) by Martin von Zweigbergk (martinvonz).
(Merged by Junio C Hamano -- gitster -- in commit d590ae5, 25 Feb 2021)

docs: clarify that refs/notes/ do not keep the attached objects alive

Signed-off-by: Martin von Zweigbergk

git help(man) gc contains this snippet:

"[...] it will keep [..] objects referenced by the index,
remote-tracking branches, notes saved by git notes under refs/notes/"

I had interpreted that as saying that the objects that notes were attached to are kept, but that is not the case.
Let's clarify the documentation by moving out the part about git notes(man) to a separate sentence.

git gc now includes in its man page:

objects referenced by the index, remote-tracking branches, reflogs (which may reference commits in branches that were later amended or rewound), and anything else in the refs/* namespace.

Note that a note (of the kind created by 'git notes') attached to an object does not contribute in keeping the object alive.
If you are expecting some objects to be deleted and they aren't, check all of those locations and decide whether it makes sense in your case to remove those references.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250