1062

I accidentally dropped a DVD-rip into a website project, then carelessly git commit -a -m ..., and, zap, the repo was bloated by 2.2 gigs. Next time I made some edits, deleted the video file, and committed everything, but the compressed file is still there in the repository, in history.

I know I can start branches from those commits and rebase one branch onto another. But what should I do to merge the 2 commits so that the big file doesn't show in the history and is cleaned in the garbage collection procedure?

Vlad L.
  • 154
  • 1
  • 9
culebrón
  • 34,265
  • 20
  • 72
  • 110
  • 13
    This article should help you http://help.github.com/removing-sensitive-data/ – MBO Jan 20 '10 at 11:23
  • 4
    Related: [Completely remove file from all Git repository commit history](http://stackoverflow.com/questions/307828/completely-remove-unwanted-file-from-git-repository-history). –  Apr 04 '14 at 00:34
  • 1
    Note that if your large file is in a subdir you'll need to specify the full relative path. – Johan Jul 23 '15 at 14:36
  • 1
    Also related https://help.github.com/en/articles/removing-files-from-a-repositorys-history – frederj May 27 '19 at 19:43
  • Many answers below tout BFG as easier than `git filter-branch`, but I found the opposite to be true. – 2540625 May 08 '20 at 23:45
  • 9
    Please have also a look at my answer which uses ```git filter-repo```. You should not longer use ```git filter-branch``` as it is very slow and often difficult to use. ```git filter-repo``` is around 100 times faster. – Donat Jun 01 '20 at 19:50
  • 1
    The answers have lots of good info for complex situations. For the simple case where you added the file then removed it in the very next commit, you could just squash those two commits together. – piedar Mar 05 '21 at 17:19
  • 1
    Wild that so many answers are concerned about speed. How often are you screwing your repo history that you need to care about efficiency in this operation?? – naught101 Nov 06 '21 at 01:14
  • 1
    After my 10th time going through this the right answer is git should just refuse to checkin these files rather than create all this turmoil. – Todd Hoff Mar 21 '22 at 21:18
  • I came here after googling how to block such Git pushes in the first place. I asked about that [here](https://stackoverflow.com/questions/72588572/can-i-configure-github-to-block-large-files). – Albert Jun 11 '22 at 23:39
  • As suggested in other comments, the go to option nowadays is to use `git filter-repo`, as in [this answer](https://stackoverflow.com/a/61602985/86072) – LeGEC Nov 13 '22 at 08:08
  • [Here](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository)'s an updated link for the "[removing sensitive data](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository)" link originally posted by @MBO – KOGI Jan 20 '23 at 17:31

23 Answers23

825

Use the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.

Carefully follow the usage instructions, the core part is just this:

$ java -jar bfg.jar --strip-blobs-bigger-than 100M my-repo.git

Any files over 100MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:

$ git reflog expire --expire=now --all && git gc --prune=now --aggressive

After pruning, we can force push to the remote repo*

$ git push --force

*NOTE: cannot force push a protect branch on GitHub

The BFG is typically at least 10-50x faster than running git-filter-branch, and generally easier to use.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Corey Cole
  • 2,262
  • 1
  • 26
  • 43
Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
  • @Roberto: I followed the usage instructions on the site doing a clone --mirror. When it came time to push the repo, it failed stating that I needed to pull first. I'm pretty sure there have been no commits between the time I clone and push back. If I pull, git complains that it needs a working tree inside my-repo.git. Any suggestions? – Tony Feb 23 '14 at 22:22
  • 6
    @tony It's worth repeating the entire cloning & clearing procedure to see if the message asking you to pull re-occurs, but it's almost certainly because your remote server is configured to reject non-fast-forward updates (ie, it's configured to stop you from losing history - which is exactly what you want to do). You need to get that setting changed on the remote, or failing that, push the updated repo history to a brand new blank repo. – Roberto Tyley Feb 23 '14 at 23:09
  • 1
    @RobertoTyley Thanks. I have tried it 3 different times and all resulted with the same message. So I'm also thinking that you're right about the remote server being configured to reject the non-fast-forward updates. I'll consider just pushing the updated repo to a brand new repo. Thank you! – Tony Feb 23 '14 at 23:30
  • 13
    @RobertoTyley Perfect, you save my time, thanks very much. By the way, maybe should do `git push --force` after your steps, otherwise the remote repo still not changed. – Weiyi Jul 22 '15 at 16:16
  • 4
    +1 to adding `git push --force`. Also worth noting: force pushes may not be allowed by the remote (gitlab.com doesn't, by default. Had to "unprotect" the branch). – MatrixManAtYrService Sep 10 '15 at 15:51
  • 2
    Instead of `--strip-blobs-bigger-than 100M` you can also use `-b 100M` according to help. – kon psych Mar 01 '16 at 18:47
  • Not sure if BFG automatically deletes the reflog references... if not, you still need to run: `git reflog expire --expire-unreachable=all` as described by @Greg Bacon in his answer here. If there are still reflog entries, the data will not be removed by `git gc`, even with `aggressive` (apparently there are limits to its aggressiveness) – JoelFan May 07 '16 at 17:22
  • 1
    Tip: If you run bfg.jar with the file declared in .gitignore, it won't be removed. – Ernesto Fernandez May 11 '16 at 08:48
  • @Tony BFG must rewrite history to do what it does, essentially creating a whole new commit tree. This by definition mean that the commits get new sha1 hashes which is why the force push is needed, as the parent is no longer what the server expects. This is usually a _GOOD_ thing, but in this particular case we know better. – Thorbjørn Ravn Andersen Sep 05 '16 at 17:47
  • @RobertoTyley: I got a general question, is BFG equally functional on Windows as much as on Linux/Mac? – Syed Waqas Oct 12 '16 at 15:39
  • 1
    @WaqasShah : yes, it runs on any platform that has Java 7 or above installed. You can download Java for Windows here: https://www.java.com/en/download/ – Roberto Tyley Oct 12 '16 at 15:54
  • 3
    BFG worked an absolute charm for me. Brought a 517mb repo down to 38 Mb in just a few minutes. Nothing else worked for me prior to finding this answer. – MitchellK Aug 14 '17 at 13:37
  • 3
    Undocumented issue (mostly) when given a "is repo packed" error. Use `git gc` on the target repo, then re-execute whatever it was you were doing with BFG. Once that was sorted worked pretty well. Could use more explicit documentation, but then I'm not the quickest learner ;p – DaveRGP Sep 04 '17 at 13:40
  • How do you install that stuff? `brew install bfg` gives me `Warning: bfg 1.12.15 is already installed` ok: `$ java -jar bfg.jar --strip-blobs-bigger-than 1M myrepo.git` I get: `Error: Unable to access jarfile bfg.jar` – user189035 Nov 28 '17 at 00:13
  • @DaveRGP Thanks for tip) that issue: `does the repo need to be packed?` definitely must be documented. – Ivan Talalaev Dec 21 '17 at 08:23
  • 1
    +1 for BFG, I tried the "standard" method using filter-branch and it's FAR slower and in my case it didn't removed all the references to the big files... – gabry Feb 08 '18 at 14:58
  • 1
    what is myrepo.git? – thang Jul 30 '18 at 03:07
  • In your output, you state that we should run `git reflog expire --expire=now --all && git gc --prune=now --aggressive` – Blairg23 Nov 07 '18 at 00:45
  • Is there a way to remove the "Former-commit-id" from all commits? – Maroun May 12 '19 at 06:26
  • Master branch is protected from direct pushes. Will the procedure work as expected from private branch and PR to master? – Gregory Danenberg Jun 19 '19 at 11:03
  • 1
    Using a free tool that has three lines of output bothers you? Better avoid open-source projects! – tedder42 Aug 27 '19 at 19:21
  • @Roberto Tyley: How do I remove commits which are older than HEAD~5? – shim_mang Oct 14 '19 at 09:41
  • What happens to the commits that happens while we are doing `git gc --prune=now --aggressive` ? because it takes long time... ? – eugene Oct 31 '19 at 03:07
  • After the push you probably want to run: `git fetch && git reset origin/master --soft` on existing clones. – jan-glx Jan 13 '20 at 11:01
  • FYI BFG does not work as advertised, and filter-branch does not take that long. Running BFG 10-30 times with different branches and different configurations takes much longer. – Chris May 18 '20 at 16:04
  • I tried many other solutions, but BFS is the only one that resulted in a reduction of size "at the remote server side" and maintaining the history at the same time. – mustafabar Nov 30 '20 at 10:04
  • Works really well! My only question before pushing is what will happen with the closed pull requests? Are we going to loose them as the commit hashes will change ? – Andrei Boyanov Mar 01 '21 at 07:07
  • 1
    Confirmed April 2021, `git gc --prune=now --aggressive` still does the trick ! – davewoodhall Apr 16 '21 at 14:43
  • Does this command work when working with local repo? The problem I have is that I have some large file in my local history and I cannot push to remote, so what would be my repo.git? – MA19 May 11 '22 at 17:23
  • @MA19 Yep, wfm. – Lee Goddard Sep 06 '22 at 13:43
  • Got an exception right off the bat using this tool: "Cleaning commits: 35% ( 7200/20570)java.lang.reflect.InvocationTargetException". I couldn't find a way around it. – grayaii Nov 21 '22 at 01:34
  • Worked like a charm – saumilsdk Jul 13 '23 at 07:31
697

NB: Since this answer was written, git filter-branch has been deprecated and it no longer supported. See the man page for more information.


What you want to do is highly disruptive if you have published history to other developers. See “Recovering From Upstream Rebase” in the git rebase documentation for the necessary steps after repairing your history.

You have at least two options: git filter-branch and an interactive rebase, both explained below.

Using git filter-branch

I had a similar problem with bulky binary test data from a Subversion import and wrote about removing data from a git repository.

Say your git history is:

$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A     login.html
* cb14efd Remove DVD-rip
| D     oops.iso
* ce36c98 Careless
| A     oops.iso
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

Note that git lola is a non-standard but highly useful alias. (See the addendum at the end of this answer for details.) The --name-status switch to git log shows tree modifications associated with each commit.

In the “Careless” commit (whose SHA1 object name is ce36c98) the file oops.iso is the DVD-rip added by accident and removed in the next commit, cb14efd. Using the technique described in the aforementioned blog post, the command to execute is:

git filter-branch --prune-empty -d /dev/shm/scratch \
  --index-filter "git rm --cached -f --ignore-unmatch oops.iso" \
  --tag-name-filter cat -- --all

Options:

  • --prune-empty removes commits that become empty (i.e., do not change the tree) as a result of the filter operation. In the typical case, this option produces a cleaner history.
  • -d names a temporary directory that does not yet exist to use for building the filtered history. If you are running on a modern Linux distribution, specifying a tree in /dev/shm will result in faster execution.
  • --index-filter is the main event and runs against the index at each step in the history. You want to remove oops.iso wherever it is found, but it isn’t present in all commits. The command git rm --cached -f --ignore-unmatch oops.iso deletes the DVD-rip when it is present and does not fail otherwise.
  • --tag-name-filter describes how to rewrite tag names. A filter of cat is the identity operation. Your repository, like the sample above, may not have any tags, but I included this option for full generality.
  • -- specifies the end of options to git filter-branch
  • --all following -- is shorthand for all refs. Your repository, like the sample above, may have only one ref (master), but I included this option for full generality.

After some churning, the history is now:

$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A     login.html
* e45ac59 Careless
| A     other.html
|
| * f772d66 (refs/original/refs/heads/master) Login page
| | A   login.html
| * cb14efd Remove DVD-rip
| | D   oops.iso
| * ce36c98 Careless
|/  A   oops.iso
|   A   other.html
|
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

Notice that the new “Careless” commit adds only other.html and that the “Remove DVD-rip” commit is no longer on the master branch. The branch labeled refs/original/refs/heads/master contains your original commits in case you made a mistake. To remove it, follow the steps in “Checklist for Shrinking a Repository.”

$ git update-ref -d refs/original/refs/heads/master
$ git reflog expire --expire=now --all
$ git gc --prune=now

For a simpler alternative, clone the repository to discard the unwanted bits.

$ cd ~/src
$ mv repo repo.old
$ git clone file:///home/user/src/repo.old repo

Using a file:///... clone URL copies objects rather than creating hardlinks only.

Now your history is:

$ git lola --name-status
* 8e0a11c (HEAD, master) Login page
| A     login.html
* e45ac59 Careless
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

The SHA1 object names for the first two commits (“Index” and “Admin page”) stayed the same because the filter operation did not modify those commits. “Careless” lost oops.iso and “Login page” got a new parent, so their SHA1s did change.

Interactive rebase

With a history of:

$ git lola --name-status
* f772d66 (HEAD, master) Login page
| A     login.html
* cb14efd Remove DVD-rip
| D     oops.iso
* ce36c98 Careless
| A     oops.iso
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

you want to remove oops.iso from “Careless” as though you never added it, and then “Remove DVD-rip” is useless to you. Thus, our plan going into an interactive rebase is to keep “Admin page,” edit “Careless,” and discard “Remove DVD-rip.”

Running $ git rebase -i 5af4522 starts an editor with the following contents.

pick ce36c98 Careless
pick cb14efd Remove DVD-rip
pick f772d66 Login page

# Rebase 5af4522..f772d66 onto 5af4522
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
#

Executing our plan, we modify it to

edit ce36c98 Careless
pick f772d66 Login page

# Rebase 5af4522..f772d66 onto 5af4522
# ...

That is, we delete the line with “Remove DVD-rip” and change the operation on “Careless” to be edit rather than pick.

Save-quitting the editor drops us at a command prompt with the following message.

Stopped at ce36c98... Careless
You can amend the commit now, with

        git commit --amend

Once you are satisfied with your changes, run

        git rebase --continue

As the message tells us, we are on the “Careless” commit we want to edit, so we run two commands.

$ git rm --cached oops.iso
$ git commit --amend -C HEAD
$ git rebase --continue

The first removes the offending file from the index. The second modifies or amends “Careless” to be the updated index and -C HEAD instructs git to reuse the old commit message. Finally, git rebase --continue goes ahead with the rest of the rebase operation.

This gives a history of:

$ git lola --name-status
* 93174be (HEAD, master) Login page
| A     login.html
* a570198 Careless
| A     other.html
* 5af4522 Admin page
| A     admin.html
* e738b63 Index
  A     index.html

which is what you want.

Addendum: Enable git lola via ~/.gitconfig

Quoting Conrad Parker:

The best tip I learned at Scott Chacon’s talk at linux.conf.au 2010, Git Wrangling - Advanced Tips and Tricks was this alias:

lol = log --graph --decorate --pretty=oneline --abbrev-commit

This provides a really nice graph of your tree, showing the branch structure of merges etc. Of course there are really nice GUI tools for showing such graphs, but the advantage of git lol is that it works on a console or over ssh, so it is useful for remote development, or native development on an embedded board …

So, just copy the following into ~/.gitconfig for your full color git lola action:

[alias]
        lol = log --graph --decorate --pretty=oneline --abbrev-commit
        lola = log --graph --decorate --pretty=oneline --abbrev-commit --all
[color]
        branch = auto
        diff = auto
        interactive = auto
        status = auto
larsks
  • 277,717
  • 41
  • 399
  • 399
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
  • 5
    Why i can't push when using git filter-branch, failed to push some refs to 'git@bitbucket.org:product/myproject.git' To prevent you from losing history, non-fast-forward updates were rejected Merge the remote changes before pushing again. – Agung Prasetyo Feb 04 '13 at 10:49
  • 11
    Add the `-f` (or `--force`) option to your `git push` command: “Usually, the command refuses to update a remote ref that is not an ancestor of the local ref used to overwrite it. This flag disables the check. This can cause the remote repository to lose commits; use it with care.” – Greg Bacon Feb 04 '13 at 23:47
  • 6
    This is a wonderfully thorough answer explaining the use of git-filter-branch to remove unwanted large files from history, but it's worth noting that since Greg wrote his answer, The BFG Repo-Cleaner has been released, which is often faster and easier to use - see my answer for details. – Roberto Tyley Jan 15 '14 at 15:09
  • 2
    After I do either of the procedures above, the remote repository (on GitHub) does NOT delete the large file. Only the local does. I force push and nada. What am I missing? – 4Z4T4R May 13 '14 at 21:11
  • 1
    this also works on dirs. `... "git rm --cached -rf --ignore-unmatch path/to/dir"...` – rynop Aug 20 '14 at 16:08
  • I can't just delete "pick cb14efd Remove DVD-rip" line, cause in "Remove DVD-rip" commit I did some other stuff. (in Interactive Rebase solution) – Ehsan Dec 29 '14 at 09:28
  • @Ehsan In your case, mark both commits with `edit` and clean up by hand in the shell. – Greg Bacon Dec 29 '14 at 19:45
  • hiks, can anyone explain more simple step.. this is to confusing to me.. :( – Budi Mulyo Sep 01 '19 at 08:50
  • @AaA See the link to the [blog post by Conrad Parker with the definition of `git lola`](http://blog.kfish.org/2010/04/git-lola.html), also linked in the answer. – Greg Bacon Nov 08 '20 at 23:52
  • @GregBacon, I'm sorry, my comment is missing and I don't even remember what was the question – AaA Nov 14 '20 at 15:36
  • @AaA My recollection is you asked about the definitions of `git lol` and `git lola`. – Greg Bacon Nov 16 '20 at 18:10
  • I removed the `refs/original/refs/heads/master` branch created after the `filter-branch` using the backup+`git clone` steps and the remotes of the original repo were lost in the new repo, having the original repo itself as the new remote instead. I would perhaps indicate it for completeness – Alf Pascu May 19 '21 at 18:02
  • Your solution works when I apply two times, second one after the `git push --all --force` , is it normal? – alper Aug 31 '21 at 14:30
  • 1
    Thanks for the detailed answered. Solved the issue for me. – Gideon A. Apr 21 '22 at 21:07
  • The interactive rebase approach is the best one, i think. Self explaining and full control without third-party tools. Just `git rm --cached file.ext` did not working for me and asked for forcing. But in my case (just want to move the file to LFS) it also worked by just add an suitable `.gitattributes` file in the same commit. – Sukombu Jul 05 '22 at 09:57
315

NB: Since this answer was written, git filter-branch has been deprecated and it no longer supported. See the man page for more information.


Why not use this simple but powerful command?

git filter-branch --tree-filter 'rm -f DVD-rip' HEAD

The --tree-filter option runs the specified command after each checkout of the project and then recommits the results. In this case, you remove a file called DVD-rip from every snapshot, whether it exists or not.

If you know which commit introduced the huge file (say 35dsa2), you can replace HEAD with 35dsa2..HEAD to avoid rewriting too much history, thus avoiding diverging commits if you haven't pushed yet. This comment courtesy of @alpha_989 seems too important to leave out here.

See this link.

larsks
  • 277,717
  • 41
  • 399
  • 399
Gary Gauh
  • 4,984
  • 5
  • 30
  • 43
  • 7
    Much better than bfg. I was unable to clean file from a git with bfg, but this command helped – podarok Jul 01 '16 at 11:56
  • 4
    This is great. Just a note for others that you'll have to do this per branch if the large file is in multiple branches. – James Aug 19 '16 at 01:38
  • 1
    This worked for me on a local commit that I couldn't upload to GitHub. And it seemed simpler than the other solutions. – Richard G Feb 03 '17 at 16:32
  • All this does for me is create a huge `.git-rewrite` directory while keeping the removed files in the repo. – oarfish May 29 '17 at 19:42
  • 6
    If you know the `commit` where you put the file in (say `35dsa2`), you can replace `HEAD` with `35dsa2..HEAD`. `tree-filter` is much slower than `index-filter` that way it wont try to checkout all the commits and rewrite them. if you use HEAD, it will try to do that. – alpha_989 Jan 21 '18 at 20:10
  • I tried this and now have "Your branch and 'origin/master' have diverged, and have 49 and 44 different commits each, respectively." – stevec Jun 09 '18 at 05:15
  • 11
    After running the above command, you then have to run `git push --all --force` to get remote's history to match the amended version you have now created locally (@stevec) – Noel Evans Jun 16 '20 at 19:05
  • does this keep the latest version of the file? – João Pimentel Ferreira Sep 11 '22 at 23:13
  • This worked great for me! @JoãoPimentelFerreira - for me, it kept the latest version of the file, plus allowed me to reconcile the local commits that hadn't been pushed to the remote yet. I backed everything up just in case and would recommend doing the same. – mmoore Oct 17 '22 at 16:50
145

(The best answer I've seen to this problem is: https://stackoverflow.com/a/42544963/714112 , copied here since this thread appears high in Google search rankings but that other one doesn't)

A blazingly fast shell one-liner

This shell script displays all blob objects in the repository, sorted from smallest to largest.

For my sample repo, it ran about 100 times faster than the other ones found here.
On my trusty Athlon II X4 system, it handles the Linux Kernel repository with its 5,622,155 objects in just over a minute.

The Base Script

git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| awk '/^blob/ {print substr($0,6)}' \
| sort --numeric-sort --key=2 \
| cut --complement --characters=13-40 \
| numfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

When you run above code, you will get nice human-readable output like this:

...
0d99bb931299  530KiB path/to/some-image.jpg
2ba44098e28f   12MiB path/to/hires-image.png
bd1741ddce0d   63MiB path/to/some-video-1080p.mp4

Fast File Removal

Suppose you then want to remove the files a and b from every commit reachable from HEAD, you can use this command:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch a b' HEAD
Sridhar Sarnobat
  • 25,183
  • 12
  • 93
  • 106
  • 6
    If your repo has any tags, you likely also want to add the flag `--tag-name-filter cat` to re-tag the new corresponding commits as they are rewritten, i.e., `git filter-branch --index-filter 'git rm --cached --ignore-unmatch a b' --tag-name-filter cat HEAD` (see [this related answer](https://stackoverflow.com/a/5574694/345236)) – naitsirhc Feb 08 '18 at 03:25
  • 4
    Mac instructions and some other info appear in the original linked post – nruth Mar 05 '18 at 18:55
  • 3
    `git filter-branch --index-filter 'git rm --cached --ignore-unmatch ' HEAD` workorder right of the bat – eleijonmarck Apr 05 '18 at 06:00
  • 2
    my favourite answer. a slight tweak to use on mac os (using gnu commands) `git rev-list --objects --all \ | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \ | awk '/^blob/ {print substr($0,6)}' \ | sort --numeric-sort --key=2 \ | gnumfmt --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest` – Florian Oswald Apr 16 '19 at 14:08
  • cool script with the rev-list but it didn't work for me as an alias, any idea how to do that? – Robin Manoli Oct 09 '19 at 11:02
  • 1
    Thank you, however for Mac OSX + zsh it did not work, and I modified it to a simpler version : git rev-list --objects --all git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' awk '/^blob/ {print substr($0,6)}' sort --numeric-sort --key=2 – Vasili Pascal Oct 26 '19 at 09:21
  • Your assumption is that file with known name need to be removed, in my case all files are called index.html (different folders) and only one of them need to be removed which I happen to know its hash – AaA Sep 02 '20 at 04:28
  • How does your answer differs from @Greg Bacon 's answer? – alper Aug 31 '21 at 14:19
114

100 times faster than git filter-branch and easier to use

There are very good answers in this thread, but meanwhile many of them are outdated. Using git-filter-branch is no longer recommended, because it is difficult to use and awfully slow on big repositories with many commits.

git-filter-repo is much faster and easier to use.

git-filter-repo is a Python script, available at github: https://github.com/newren/git-filter-repo . When installed it looks like a regular git command and can be called by git filter-repo.

You need only one file: the Python3 script git-filter-repo. Copy it to a path that is included in the PATH variable. On Windows you may have to change the first line of the script (refer INSTALL.md). You need Python3 installed installed on your system, but this is not a big deal.

First you can run

git filter-repo --analyze

This helps you to determine what to do next.

You can delete your DVD-rip file everywhere:

git filter-repo --invert-paths --path-match DVD-rip
 

Filter-repo is really fast. A task that took around 9 hours on my computer by filter-branch, was completed in 4 minutes by filter-repo. You can do many more nice things with filter-repo. Refer to the documentation for that.

Warning: Do this on a copy of your repository. Many actions of filter-repo cannot be undone. filter-repo will change the commit hashes of all modified commits (of course) and all their descendants down to the last commits!

Donat
  • 4,157
  • 3
  • 11
  • 26
  • 2
    How do I submit the applied changes (on my local repository) to a remote repository? Or this is not possible, and I should clone the amended repo to a new one? – diman82 Feb 01 '21 at 15:15
  • 4
    @diman82: Best would be to make a new empty repository, set the remote repository from your cloned repo to that and push. This is common to all these answers here: You will get many new commit hashes. This is unavoidable because the commit hashes guarantee for the content and the history of a repo. The alternative way is dangerous, you could make a force push and then run gc to get rid of the files. But do not do this unless you have tested very well and you are aware of all the consequences ! – Donat Feb 01 '21 at 19:17
  • I've already pushed (with --force option), worked well (to a cloned repository, as a precaution). – diman82 Feb 03 '21 at 12:19
  • 14
    `git filter-repo --strip-blobs-bigger-than 10M` worked much better on my end – Lucas Jul 13 '21 at 06:19
  • 1
    This worked well for me. filter-repo has good documentation for more advanced cases but in mine, I just needed to get rid of big file I accidentally committed. In my case, it worked fine to duplicate the project dir, run the command in the new version, re-add the remote and push (no strictly fresh clone). – Alex L May 24 '22 at 22:25
  • 3
    this should be the accepted answer now. Worked amazingly well. – james-see May 31 '22 at 20:24
110

After trying virtually every answer in SO, I finally found this gem that quickly removed and deleted the large files in my repository and allowed me to sync again: http://www.zyxware.com/articles/4027/how-to-delete-files-permanently-from-your-local-and-remote-git-repositories

CD to your local working folder and run the following command:

git filter-branch -f --index-filter "git rm -rf --cached --ignore-unmatch FOLDERNAME" -- --all

replace FOLDERNAME with the file or folder you wish to remove from the given git repository.

Once this is done run the following commands to clean up the local repository:

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

Now push all the changes to the remote repository:

git push --all --force

This will clean up the remote repository.

Justin
  • 1,199
  • 1
  • 7
  • 6
  • 1
    Worked like a charm for me. – Ramon Vasconcelos Apr 16 '18 at 07:17
  • 4
    This worked for me as well. Gets rid of a specific folder (in my case, one that contained files too large or a Github repo) on the repository, but keeps it on the local file system in case it exists. – skizzo Jul 08 '18 at 12:13
  • Worked for me! no history is left which is potentially confusing (if someone where to clone right now), make sure you have a plan to update any broken links, dependencies, etc – ruoho ruotsi Jun 19 '19 at 05:11
  • I tried the `filter-branch` methods described in the other answers, but they didn't work. After filtering, I still got file size too big error when pushing to GitHub. This solution worked, most likely because it removed the big file from ALL occurrences in ALL branches. – Fanchen Bao Jul 30 '20 at 22:43
  • May also need `git push origin --tags --force` to remove large files from the remote in tagged releases. – Kostas Stamos May 12 '21 at 18:25
  • This is great. Want to add, for the last command, if you have a very large repo with many commits, instead of doing `--all`, read this answer (https://stackoverflow.com/a/51468389) to split up pushes (note, replace *git push* with *git push -f*). I had to do this because the pack size exceeded 2 GB trying to push everything at once. And comment two - *back up everything*! – Stardust Aug 21 '21 at 03:25
  • 1
    I guess it's only me that didn't realize this command will also nuke the file from the project itself, not just the git repo. Certainly worked though! – Karl Nov 16 '21 at 02:38
  • Worked for me! One question though, why is `git reflog expire` required? – Ng Ju Ping Mar 15 '22 at 17:39
  • why is git reflog expire required? – gyuunyuu Apr 28 '22 at 09:27
44

These commands worked in my case:

git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

It is little different from the above versions.

For those who need to push this to github/bitbucket (I only tested this with bitbucket):

# WARNING!!!
# this will rewrite completely your bitbucket refs
# will delete all branches that you didn't have in your local

git push --all --prune --force

# Once you pushed, all your teammates need to clone repository again
# git pull will not work
Kostanos
  • 9,615
  • 4
  • 51
  • 65
  • 4
    How is it different from above, why is it better? – Andy Hayden Jun 14 '13 at 09:08
  • 1
    For some reason mkljun version is not reduced git space in my case, I already had removed the files from index by using `git rm --cached files`. The Greg Bacon's proposition is more complete, and quite the same to this mine, but he missed the --force index for cases when you are using filter-branch for multiple times, and he wrote so much info, that my version is like resume of it. – Kostanos Jun 14 '13 at 14:09
  • 1
    This really helped but I needed to use the `-f` option not just `-rf` here `git rm --cached -rf --ignore-unmatch oops.iso` instead of `git rm --cached -r --ignore-unmatch oops.iso` as per @lfender6445 below – drstevok Oct 21 '16 at 06:18
19

According to GitHub Documentation, just follow these steps:

  1. Get rid of the large file

Option 1: You don't want to keep the large file:

rm path/to/your/large/file        # delete the large file

Option 2: You want to keep the large file into an untracked directory

mkdir large_files                       # create directory large_files
touch .gitignore                        # create .gitignore file if needed
'/large_files/' >> .gitignore           # untrack directory large_files
mv path/to/your/large/file large_files/ # move the large file into the untracked directory
  1. Save your changes
git add path/to/your/large/file   # add the deletion to the index
git commit -m 'delete large file' # commit the deletion
  1. Remove the large file from all commits
git filter-branch --force --index-filter \
  "git rm --cached --ignore-unmatch path/to/your/large/file" \
  --prune-empty --tag-name-filter cat -- --all
git push <remote> <branch>
Kevin R.
  • 1,283
  • 10
  • 8
  • can you elaborate on how the "remove the large file from all commits" step worked, that was amazing! – clayg Dec 02 '20 at 21:51
  • Thanks @clayg. I don't understand deeply the `git filter-branch` command, as I wrote, I just followed the GitHub documentation. What I know is that this command browses through your `.git` folder and find all tracks of the given file and removes it from the history. – Kevin R. Dec 28 '20 at 10:10
  • @KevinR. you have to force push, isnt it? – Exploring Apr 29 '22 at 01:54
  • That is correct @Exploring – Kevin R. Nov 24 '22 at 09:10
  • Thank you, damn python programmers and checking in binary files. – Owl Apr 05 '23 at 13:04
13

I ran into this with a bitbucket account, where I had accidentally stored ginormous *.jpa backups of my site.

git filter-branch --prune-empty --index-filter 'git rm -rf --cached --ignore-unmatch MY-BIG-DIRECTORY-OR-FILE' --tag-name-filter cat -- --all

Relpace MY-BIG-DIRECTORY with the folder in question to completely rewrite your history (including tags).

source: https://web.archive.org/web/20170727144429/http://naleid.com:80/blog/2012/01/17/finding-and-purging-big-files-from-git-history/

2540625
  • 11,022
  • 8
  • 52
  • 58
random-forest-cat
  • 33,652
  • 11
  • 120
  • 99
  • 1
    This response helped me, except the script in the answer has a slight issue and it doesn't search in all branches form me. But the command in the link did it perfectly. – Ali B Sep 05 '15 at 20:20
  • Add `-f` after `git filter-branch`, if need to overwrite previous backup – Sheldon Jun 01 '22 at 09:32
10

Just note that this commands can be very destructive. If more people are working on the repo they'll all have to pull the new tree. The three middle commands are not necessary if your goal is NOT to reduce the size. Because the filter branch creates a backup of the removed file and it can stay there for a long time.

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/ 
$ git reflog expire --all 
$ git gc --aggressive --prune
$ git push origin master --force
om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
mkljun
  • 157
  • 1
  • 2
  • 14
    Do NOT run these commands unless you want to create immense pain for yourself. It deleted a lot of my original source code files. I assumed it would purge some large files from my commit history in GIT (as per the original question), however, I think this command is designed to permanently purge files from your original source code tree (big difference!). My system: Windows, VS2012, Git Source Control Provider. – Contango Oct 22 '12 at 11:16
  • 2
    I used this command: `git filter-branch --force --index-filter 'git rm --cached -r --ignore-unmatch oops.iso' --prune-empty --tag-name-filter cat -- --all` instead of first one from your code – Kostanos Jun 14 '13 at 02:31
  • 2
    @mkljun, please at least remove "git push origin master --force"! First of all it is not related to the original question - author didn't ask how to edit commits and push changes to some repository. And second - this is dangerous, you really can delete a lot of files and push changes to remote repository without first check what was deleted is not a good idea. – Ezh Aug 21 '21 at 10:27
9

git filter-branch --tree-filter 'rm -f path/to/file' HEAD worked pretty well for me, although I ran into the same problem as described here, which I solved by following this suggestion.

The pro-git book has an entire chapter on rewriting history - have a look at the filter-branch/Removing a File from Every Commit section.

Community
  • 1
  • 1
Thorsten Lorenz
  • 11,781
  • 8
  • 52
  • 62
8

If you know your commit was recent instead of going through the entire tree do the following: git filter-branch --tree-filter 'rm LARGE_FILE.zip' HEAD~10..HEAD

Soheil
  • 769
  • 8
  • 17
8

NEW ANSWER THAT WORKS IN 2022.

DO NOT USE:

git filter-branch

this command might not change the remote repo after pushing. If you clone after using it, you will see that nothing has changed and the repo still has a large size. It seems this command is old now. For example, if you use the steps in https://github.com/18F/C2/issues/439, this won't work.

The Solution

This solution is based on using:

git filter-repo

Steps:

(1) Find the largest files in .git (change 10 to whatever number of files you wanna display):

git rev-list --objects --all | grep -f <(git verify-pack -v  .git/objects/pack/*.idx| sort -k 3 -n | cut -f 1 -d " " | tail -10)

(2) Start filtering these large files by passing the path&name of the file you would like to remove:

 git filter-repo --path-glob '../../src/../..' --invert-paths --force

or use the extension of the file e.g. to filter all zip files:

 git filter-repo --path-glob '*.zip' --invert-paths --force

or e.g. to filter all .a lib files:

 git filter-repo --path-glob '*.a' --invert-paths --force

or whatever you find in step 1.

(3)

 git remote add origin git@github.com:.../...git

(4)

git push --all --force

git push --tags --force

DONE!!!

Moe
  • 382
  • 4
  • 10
  • What does "Strat" mean in item 2). What are you doing in that step. Please explain what 3 is doing, especially ".../...git". I already have repo with a remote. What is all of the .../ about? – pauljohn32 Nov 30 '22 at 00:36
  • I like this solution. Poster should've mentioned "filter-repo" isn't a native git command, you have to install a python script: https://github.com/newren/git-filter-repo – inorganik Feb 01 '23 at 17:12
  • Is this a message from the future? Please tell me what life is like in 20222. I can't believe you are still using git. – Frank Hileman May 11 '23 at 00:53
6

This will remove it from your history

git filter-branch --force --index-filter 'git rm -r --cached --ignore-unmatch bigfile.txt' --prune-empty --tag-name-filter cat -- --all
sparkle
  • 7,530
  • 22
  • 69
  • 131
5

I basically did what was on this answer: https://stackoverflow.com/a/11032521/1286423

(for history, I'll copy-paste it here)

$ git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch YOURFILENAME" HEAD
$ rm -rf .git/refs/original/ 
$ git reflog expire --all 
$ git gc --aggressive --prune
$ git push origin master --force

It didn't work, because I like to rename and move things a lot. So some big file were in folders that have been renamed, and I think the gc couldn't delete the reference to those files because of reference in tree objects pointing to those file. My ultimate solution to really kill it was to:

# First, apply what's in the answer linked in the front
# and before doing the gc --prune --aggressive, do:

# Go back at the origin of the repository
git checkout -b newinit <sha1 of first commit>
# Create a parallel initial commit
git commit --amend
# go back on the master branch that has big file
# still referenced in history, even though 
# we thought we removed them.
git checkout master
# rebase on the newinit created earlier. By reapply patches,
# it will really forget about the references to hidden big files.
git rebase newinit

# Do the previous part (checkout + rebase) for each branch
# still connected to the original initial commit, 
# so we remove all the references.

# Remove the .git/logs folder, also containing references
# to commits that could make git gc not remove them.
rm -rf .git/logs/

# Then you can do a garbage collection,
# and the hidden files really will get gc'ed
git gc --prune --aggressive

My repo (the .git) changed from 32MB to 388KB, that even filter-branch couldn't clean.

Dolanor
  • 822
  • 9
  • 19
4

Use Git Extensions, it's a UI tool. It has a plugin named "Find large files" which finds lage files in repositories and allow removing them permenently.

Don't use 'git filter-branch' before using this tool, since it won't be able to find files removed by 'filter-branch' (Altough 'filter-branch' does not remove files completely from the repository pack files).

Nir
  • 1,836
  • 23
  • 26
  • This method is waaay too slow for large repositories. It took over an hour to list the large files. Then when I go to delete files, after an hour it is only 1/3 of the way through processing the first file I want to delete. – kristianp Oct 04 '17 at 04:19
  • Yes, its slow, but does the work... Do you know anything quicker? – Nir Oct 06 '17 at 21:03
  • 1
    Haven't used it, but BFG Repo-Cleaner, as per another answer on this page. – kristianp Oct 09 '17 at 04:42
  • Git Extension is nice and simple. However it uses git filter-branch internally, so deletion is very slow. – Alex from Jitbit Nov 10 '22 at 19:19
3

git filter-branch is a powerful command which you can use it to delete a huge file from the commits history. The file will stay for a while and Git will remove it in the next garbage collection. Below is the full process from deleteing files from commit history. For safety, below process runs the commands on a new branch first. If the result is what you needed, then reset it back to the branch you actually want to change.

# Do it in a new testing branch
$ git checkout -b test

# Remove file-name from every commit on the new branch
# --index-filter, rewrite index without checking out
# --cached, remove it from index but not include working tree
# --ignore-unmatch, ignore if files to be removed are absent in a commit
# HEAD, execute the specified command for each commit reached from HEAD by parent link
$ git filter-branch --index-filter 'git rm --cached --ignore-unmatch file-name' HEAD

# The output is OK, reset it to the prior branch master
$ git checkout master
$ git reset --soft test

# Remove test branch
$ git branch -d test

# Push it with force
$ git push --force origin master
zhangyu12
  • 151
  • 1
  • 3
1

When you run into this problem, git rm will not suffice, as git remembers that the file existed once in our history, and thus will keep a reference to it.

To make things worse, rebasing is not easy either, because any references to the blob will prevent git garbage collector from cleaning up the space. This includes remote references and reflog references.

I put together git forget-blob, a little script that tries removing all these references, and then uses git filter-branch to rewrite every commit in the branch.

Once your blob is completely unreferenced, git gc will get rid of it

The usage is pretty simple git forget-blob file-to-forget. You can get more info here

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

I put this together thanks to the answers from Stack Overflow and some blog entries. Credits to them!

nachoparker
  • 1,678
  • 18
  • 14
1

You can do this using the branch filter command:

git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD

John Foley
  • 4,373
  • 3
  • 21
  • 23
0

Other than git filter-branch (slow but pure git solution) and BFG (easier and very performant), there is also another tool to filter with good performance:

https://github.com/xoofx/git-rocket-filter

From its description:

The purpose of git-rocket-filter is similar to the command git-filter-branch while providing the following unique features:

  • Fast rewriting of commits and trees (by an order of x10 to x100).
  • Built-in support for both white-listing with --keep (keeps files or directories) and black-listing with --remove options.
  • Use of .gitignore like pattern for tree-filtering
  • Fast and easy C# Scripting for both commit filtering and tree filtering
  • Support for scripting in tree-filtering per file/directory pattern
  • Automatically prune empty/unchanged commit, including merge commits
Philippe
  • 28,207
  • 6
  • 54
  • 78
-1

Save a backup of your current code in case anything goes wrong during this process.

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch path/to/large_file' --prune-empty --tag-name-filter cat -- --all

Replace path/to/large_file with the actual path to the large file that you want to remove. This command will rewrite the Git history and remove the large file from all commits.

After running the git filter-branch command, you may see a message that says "Ref 'refs/heads/master' is unchanged" or similar. This indicates that the branch is not updated yet. To update the branch and apply the changes, use:

git push origin --force --all
Chukwuemeka Maduekwe
  • 6,687
  • 5
  • 44
  • 67
-4

This works perfectly for me : in git extensions :

right click on the selected commit :

reset current branch to here :

hard reset ;

It's surprising nobody else is able to give this simple answer.

reset current branch to here

hard reset

Winston L
  • 53
  • 6
  • 1
    Worked for me but me mindful this deletes everything after that point – Jossy Jul 19 '20 at 16:22
  • 2
    No-one gave this answer because it does not answer the question. He wants a specific file removed from the history. Your answer nukes everything in the repo after a certain point. – Jason Kelley Apr 16 '21 at 22:51
-5
git reset --soft HEAD~1

It will keep the changes but remove the commit then you can re-commit those changes.