0

I recently made a push to an upstream branch that accidentally added in a large file we do not want in the repository. I want to remove it.

It's the most recent push that added this file, and it's not on the master branch. It's just a very recently-created branch with only one commit.

I have looked over previous SO questions on this among other sources, and am unclear on what I can do:

  • This answer implies that just using filter-tree doesn't actually free up the space in the repository, which is the entire reason I want to remove the large file.
  • The above answer also suggests remove-blob, which I cannot access. I am stuck with vanilla git operations, not extensions or otherwise.
  • Other site like this suggest that if it's already pushed, filter-branch may not even work? I'm not even sure how to interpret how to use --force-with-lease here.
  • It's also unclear from a lot of these answers if I need to be specifying some kind of a path to a file, or if it will just wipe any instance of that file from any location in the repo.

Would cherry-picking the changes from this commit to a new branch as unstaged changes, deleting the file, pushing the branch upstream and deleting the old branch locally and pushing the deletion up, potentially fix this?

Tyler Shellberg
  • 1,086
  • 11
  • 28
  • 1. remove commit. 2. remove commit from reflog / clear reflog. 3. let git gc work really hard. – Joachim Sauer Nov 17 '20 at 23:00
  • How do I remove the commit if it's already been pushed? How do I remove it from the reflog or clear the reflog? How do I get GC to do whatever it needs to do? – Tyler Shellberg Nov 17 '20 at 23:01
  • Does this answer your question? [How to permanently remove few commits from remote branch](https://stackoverflow.com/questions/3293531/how-to-permanently-remove-few-commits-from-remote-branch) – mkrieger1 Nov 17 '20 at 23:02
  • GC will eventually remove the blob on its own if it's no longer referenced by any commit. – mkrieger1 Nov 17 '20 at 23:03
  • @mkrieger1 I don't want to remove the whole commit, ideally. It had several useful changes, just the one accidental file in that commit I want to get rid of. – Tyler Shellberg Nov 17 '20 at 23:05
  • Then you need to create a new commit which has the same changes but does not contain the big file. – mkrieger1 Nov 17 '20 at 23:13
  • See https://stackoverflow.com/questions/40503417/how-to-add-a-file-to-the-last-commit-in-git (instead of adding a new file, remove the big file) – mkrieger1 Nov 17 '20 at 23:15
  • @mkrieger1 Wouldn't ammending the last commit and pushing that up to the server not actually free up the memory taken by the file, because adding it and removing it are two distinct changes? Or because it's an amendment, would the GC eventually clean it up? Maybe I'm misunderstanding. – Tyler Shellberg Nov 17 '20 at 23:19
  • 1
    Amending the commit creates a new, altered, commit and removes the reference to the original commit (i.e. it *replaces* the original commit). – mkrieger1 Nov 17 '20 at 23:21

1 Answers1

0

The concern is how to free up disk space. That could mean locally, on the remote, and/or in other clones of the remote.

In general to free disk space on any repo, you need to (1) remove all references that can reach the file you wish to delete, and (2) cause (or wait for) garbage collection to run.

Locally you definitely have enough control to do that. Since it's a single-commit branch, amending the commit should be enough to remove the branch's ability to reach the file. You also have to do away with reflog entries (both the branch's log, and the HEAD reflog) that "know about" the file (see git reflog expire ; https://git-scm.com/docs/git-reflog).

Generally, you'd also have to clean up any other refs that might refer to that commit. That could include other branches created from that commit, or tags, or various other less common things. It sounds like you're saying none of that would exist, but be aware that any ref would keep gc from removing the file. If you're not sure, then before deleting the branch you can do something like

git for-each-ref --contains <branch-name>

(You can also use the commit ID in place of <branch-name>.)

Once you think you've removed all refs that can reach the commit, you can try using git fsck to see if its ID pops up as "dangling" - which you would expect it does. If not, then you need to figure out what you missed.

Then finally, you can use git gc - see https://git-scm.com/docs/git-gc - to free up the actual storage of the original commit (and with it, the large file).

Now, on the remote you probably can't perform some or all of those steps. The most common problem is not having any control over how gc is handled on a remote. It may be that the best you can do is to remove any refs that contain the file and then hope that gc eventually gets to the file. Or, you might be able to recreate the remote if that's unacceptable.

Regardless, it's probably wise to update the branch sooner rather than later, so that other people don't fetch the file. AFAIK you can't actually guarantee that the remote won't send them the file anyway, but you can at least give it no reason to. Once you think this is done, you can test by creating a fresh clone of the remote and seeing if it delivers the offending file (perhaps by again looking for unreferenced objects using git fsck, or just by comparing repo sizes).

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
  • How would you suggest I get the changes I want to keep from that commit/branch before deleting it? If I'm understanding correctly, this basically boils down to just deleting the branch locally and pushing that deletion up, right? (I doubt in my case I'll be able to run gc manually on the remote) – Tyler Shellberg Nov 18 '20 at 15:42
  • "Since it's a single-commit branch, amending the commit should be enough to remove the branch's ability to reach the file." You check out the branch, remove the file, and `git commit --amend` – Mark Adelsberger Nov 18 '20 at 20:06