13

How can I permanently delete a commit from Git's history?

One of the developers on the team has accidentally committed a 200 MB file and pushed it to our Git server. It was deleted a few days later but the history is there. Our code files are only about 75 MB, and we have 4 branches. Due to the 200 MB file commit, the history is preserved and the size of our project folder (specifically the hidden .git folder) has ballooned to close to 700 MB. How do I permanently delete the two check-ins (commit of the huge file, delete of the huge file) from git as if it never happened? I'm using `TortoiseGit if that matters.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
tempid
  • 7,838
  • 28
  • 71
  • 101
  • 1
    Possible duplicate of [How to remove/delete a large file from commit history in Git repository?](https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository) – DavidRR Mar 18 '19 at 21:42

5 Answers5

8

Delete the file from a checkout

Github has a useful page how to permanently delete file(s) from a repository, in brief:

$ git filter-branch --force --index-filter \
  'git rm --cached --ignore-unmatch 200MB-filename' \
  --prune-empty --tag-name-filter cat -- --all
$ git push --all -f

That would remove the file from all branches. Then to recover the space locally:

$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --prune=now

Recovering space on the git server

Force pushing does not remove any commits/objects on the remote server. If you don't want to wait for git to clean up itself, you can run it explicitly on the server:

$ ssh git server
$ cd /my/project/repo.git
$ git gc --prune=now

Compare the size of the repo before and after - ensure that it is the size you expect. If at any time in the future it reverts to the larger size - someone has pushed the deleted commits back into the repository (need to do all steps again).

Teammates

If there are other developers using this repository - they will need to clean up their checkouts. Otherwise when they pull from the repository and push their changes they will add back the deleted file as it's still in their local history. There are two ways to avoid that:

  1. Clone again
  2. fetch and reset

The first is very simple, the second means one of two things:

User has no local commits

$ git fetch
$ git reset origin/master -hard

That would make any local checkout exactly match the remote

User does have local commits

$ git fetch
$ git rebase -i origin/master

The user needs to make sure they don't have any local commits referencing the delete file - or they'll add it back to the repository.

User cleanup

Then (optionally, because git won't push unreferenced commits to the server) recover space, and everyone has a consistent slimmer repository state:

$ rm -rf .git/refs/original/
$ git reflog expire --expire=now --all
$ git gc --prune=now
AD7six
  • 63,116
  • 12
  • 91
  • 123
4

I'd suggest you try The BFG - it won't remove those two commits, but it will rewrite history to get rid of the bulky files from your history.

Carefully follow the BFG's usage instructions - the core part is just this:

$ java -jar bfg.jar  --strip-blobs-bigger-than 100M  my-repo.git

It's also substantially faster than git-filter-branch on big repositories - you might find this speed comparison video interesting - the BFG running on a Raspberry Pi, git-filter-branch running on a quad-core Mac OS X box... http://youtu.be/Ir4IHzPhJuI ...which will be faster!?

Note that after the cleanup you should run git gc to get Git to recognise it doesn't need to store those big objects anymore and free-up disk space in that copy of the repository. git gc usually happens periodically on most hosted versions of Git, so when you push the cleaned history to your main Git server, that server will eventually free-up it's disk space too. Perhaps surprisingly, you don't have to wait for that git gc to run before users cloning fresh copies of your cleaned repo get just the cleaned history.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
  • 1
    While BFG itself won't recover the space associated with the history entries that BFG removes, its [documentation](https://rtyley.github.io/bfg-repo-cleaner/) indicates that running `git gc` *after* BFG *will*: Excerpt: *"The BFG will update your commits and all branches and tags so they are clean, but it doesn't physically delete the unwanted stuff. Examine the repo to make sure your history has been updated, and then use the standard `git gc` command to strip out the unwanted dirty data, which Git will now recognise as surplus to requirements:"* (See BFG doc for actual command line.) – DavidRR Mar 18 '19 at 20:12
  • 1
    Wow, on examining your profile, I just discovered that you are the author of BFG. :-) Am I interpreting the BFG documentation correctly, that `git gc` actually *does* recover the space previously allocated to the files that BFG removes from the repo's history? – DavidRR Mar 18 '19 at 20:17
  • 1
    @DavidRR you're right about what the documentation of the BFG says - yes, you should run `git gc`! I've updated my question to give some of my standard disclaimers... it's amazing how many ways the process of rewriting Git history can go off-course, so long as people follow https://rtyley.github.io/bfg-repo-cleaner/#usage they should be ok... see https://stackoverflow.com/a/49471048/438886 for a slightly longer discussion! – Roberto Tyley Mar 18 '19 at 21:25
  • Thanks for the clarification and for creating such a useful tool. I have voted to close this question as a duplicate of [this one](https://stackoverflow.com/q/2100907/1497596) where you have a [highly voted and similar answer](https://stackoverflow.com/a/17890278/1497596). – DavidRR Mar 18 '19 at 21:56
2

You can use git filter-branch. Please note that this involves history rewrite, and all clones need to be recreated. You can find a good introduction to the topic in the Pro Git book.

Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
forvaidya
  • 3,041
  • 3
  • 26
  • 33
2

As forvaidya suggested, git filter-branch is the way to go. Specifically, in your case, you can execute the following command to remove that one file from the repo's history:

git filter-branch --tree-filter 'rm -f filename' HEAD

Substitute filename with the actual file name. Again, as forvaidya said, this rewrites the entire history of the repo so anyone who pulls after you make this change will get an error.

Edit: for performance reasons, it's actually better to use Git's rm command:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
mart1n
  • 5,969
  • 5
  • 46
  • 83
0

The simple way, if it was a recent commit, is:

# check how many MB your .git dir is before you start
du -m -d0 .git

# rebase to remove the commits with large files
git rebase -i HEAD~2 # or however many commits you need to go back

# force push to remote origin
git push -f origin HEAD

Now reclone the repo and check if the large file is gone. Do this in a new dir.

git clone <url> <new dir>

# check MB of .git dir (should be smaller by the size of the large file)
du -m -d0 .git

If successful, then the cleanest way for other developers to get back on track is to reclone to a new dir and manually apply their work in progress. If the .git size did not decrease, check if there are tags or anything referencing the offending commit. You will have to delete any tags referencing the commits from the origin too.

For more complicated situations, you can try the answer by AD7six, but this is just a simple and clean way to do it.

wisbucky
  • 33,218
  • 10
  • 150
  • 101