0

First, what I did in git:

  E:\zeus>git filter-branch -f --tree-filter "rm -rf ZeusSRC_Hardware_RPi_image_Raspberry Pi_außen_20.05.2019.zip" --prune-empty -- --all
  Rewrite fa2be75c64ca78a296c8f78fc363beebecbf92a1 (1526/1526) (2745 seconds passed, remaining 0 predicted)
  Ref 'refs/heads/Kunden' was rewritten
  Ref 'refs/heads/Sensor' was rewritten
  Ref 'refs/heads/Wetter' was rewritten
  Ref 'refs/heads/ZEUS-5' was rewritten
  Ref 'refs/heads/Zeus_Bug-13' was rewritten
  WARNING: Ref 'refs/heads/master' is unchanged
  Ref 'refs/remotes/origin/ADW' was rewritten
  WARNING: Ref 'refs/remotes/origin/master' is unchanged
  Ref 'refs/remotes/origin/Kunden' was rewritten
  Ref 'refs/remotes/origin/Metzger' was rewritten
  WARNING: Ref 'refs/remotes/origin/Mond' is unchanged
  Ref 'refs/remotes/origin/Sensor' was rewritten
  WARNING: Ref 'refs/remotes/origin/Sonne' is unchanged
  Ref 'refs/remotes/origin/Wetter' was rewritten
  WARNING: Ref 'refs/remotes/origin/ZEUS-2' is unchanged
  WARNING: Ref 'refs/remotes/origin/ZEUS-3' is unchanged
  Ref 'refs/remotes/origin/ZEUS-5' was rewritten
  Ref 'refs/remotes/origin/ZEUS_BUG-12' was rewritten
  WARNING: Ref 'refs/remotes/origin/ZEUS_BUG-4' is unchanged
  Ref 'refs/remotes/origin/ZEUS_BUG-6' was rewritten
  WARNING: Ref 'refs/remotes/origin/ZEUS_BUG-8' is unchanged
  Ref 'refs/remotes/origin/ZEUS_BUG-9' was rewritten
  Ref 'refs/remotes/origin/Zeus_Bug-13' was rewritten
  WARNING: Ref 'refs/remotes/origin/master' is unchanged
  WARNING: Ref 'refs/remotes/origin/metzger' is unchanged
  WARNING: Ref 'refs/remotes/origin/tempAddFirstCode' is unchanged
  Ref 'refs/stash' was rewritten

Second, what this was SUPPOSED to do: I have some ISO in my repo, about 4GB size. I removed it, used git add . and then commited and pushed, but of course the repo size didnt change because there still are the objects which were once associated with the former commits. So what I first want to know:

According to the output of git above, can I assume that the file affected by the git command is now REMOVED from history?

Because as far as I understand, the removal of a file from history is a mandatory prerequisite for what I want to do next: Use the garbage collector to remove the related object from my repo "manually".

I already stumbled over several posts on SO which dealt with this problem, for example I found this rather popular script for git:

 git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc

which might require to execute the following before:

      git remote rm origin
      rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
      git for-each-ref --format="%(refname)" refs/original/ | xargs -n1 -- 
      no-run-if-empty git update-ref -d

from here: How to remove unreferenced blobs from my git repo However, these commands didnt really work for me :/

Hicksfeld
  • 19
  • 1
  • 5

1 Answers1

0

What's potentially missing

  1. If you have any tags, you may need to redo the filter-branch (start over from a good copy of the repository) and include:

    --tag-name-filter cat
    

    in the git filter-branch options.

  2. If you have keep files on packs, they may prevent the removal of the large objects. (If you do have these, you probably know about it.)

Discussion

According to the output of git above, can I assume that the file affected by the git command is now REMOVED from history?

It's more accurate to say instead that there is a new history in which the file was never added, which is present in addition to the existing history in which the file was added. Each was rewritten reference points to a commit at the tip of the new history. Each is unchanged points to a commit at the tip of existing unchanged history, which is OK because that existing unchanged history never had the file in it. For instance, imagine the following highly simplified diagram (with just two branches):

A--B--C   <-- master
       \
        D--E   <-- Kunden

where the files ZeusSRC_Hardware_RPi_image_Raspberry and Pi_außen_20.05.2019.zip exist in commit D. So git filter-branch extractd commit D, removed the two files, and made a new commit we'll call D' that no longer has the files:

        D'-E'
       /
A--B--C   <-- master
       \
        D--E   <-- Kunden

These two files may or may not exist in E as well but creating a new commit D' means that Git must also create a new E', and of course the new E' also has them removed if they existed in E.

Now that the new history is built, Git must discard the existing refs/heads/Kunden and put in a refs/heads/Kunden (Kunden branch) pointing to commit E'. The existing refs/heads/master is OK so it can be left alone:

        D'-E'   <-- Kunden
       /
A--B--C   <-- master
       \
        D--E   [original Kunden]

The refs/original/refs/heads/Kunden name that git filter-branch leaves behind retains commit E, but so do various less-visible reflog entries. The first ones—the refs/original/* names—is what this is about:

git for-each-ref --format="%(refname)" refs/original/ | xargs -n1 --no-run-if-empty git update-ref -d

as it will delete each such name.

This:

 git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc

was intended to take care of the second—the reflogs, and also include the last step of git gc. The advice in the git filter-branch documentation uses two separate commands instead:

git reflog expire --expire=now --all
git gc --prune=now

Offhand, I'd expect the one git gc with configuration items specified to work, but if it does not, see the filter-branch documentation.

Once all references to commit E are gone, and assuming there are no references to commit D, git gc will discard objects D and E. If these objects were packed, Git will build a new pack file that omits them, and turn them into loose objects. The loose-object pruning will discard them after the object-prune delay, and the old pack file will be garbage-collected unless it is saved with a pack keep file.

Suppose that you have a tag name such as v2.1. Suppose further that the tag name points to commit D:

        D'-E'   <-- Kunden
       /
A--B--C   <-- master
       \
        D--E
         .
          ...... <-- tag: v2.1

Since refs/tags/v2.1 was not rewritten, tag v2.1 continues to retain commit D, which continues to retain the large file. When you ran your git filter-branch, it built a mapping that held the fact that new commit D' was the correct replacement for D. If you still had the mapping, you could use that to forcibly move the tag v2.1 so that it points to commit D' instead. Unfortunately, when git filter-branch finishes, it removes the mapping, thinking it is done with it, after doing all the "rewrite" operations on the various names.

(Side note: there is no need to git rm origin. As you can see above, a number of remote-tracking names were rewritten, including, e.g., refs/remotes/origin/Kunden. This does mean that you won't be able to update other Git repository over at origin without using git push -f, and if you don't update it, git fetch will bring the large files back.)

Last, this:

Ref 'refs/stash' was rewritten

means that your existing saved stash has probably been damaged and can no longer be applied. (Filter-branch does not realize that stashes are deliberately a little weird and treats them as if they're normal merges, and will sometimes break them.)

torek
  • 448,244
  • 59
  • 642
  • 775