9

Is there a way or command to delete a blob from git using its ID

I used the command

$ git rev-list --objects --all | git cat-file --batch-check='%(objectname) %(objecttype) %(rest)' | grep '^[^ ]* blob' | cut -d" " -f1,3-

And got the list of blobs in all versions like

62f7e0df0b80bce8d0a4cb388be8988df1bec5ef NodeApplication/NodeApplication/public/javascripts/homescript.js
b1d69387fbd4d4e84bbe9eb2c7f59053c0355e11 NodeApplication/NodeApplication/iisnode/index.html
624642d6f2a86844dc145803260537be0fe40090 NodeApplication/NodeApplication/.ntvs_analysis.dat

Now I want to delete the blob

NodeApplication/NodeApplication/.ntvs_analysis.dat. 

How can I do that?

keerthee
  • 812
  • 4
  • 17
  • 39
  • You will need `git filter-branch`, see https://help.github.com/articles/remove-sensitive-data/ – David Duponchel Aug 06 '15 at 13:54
  • Actually I did git filter,gc everything that reduced my repo size and pushed to repo in TFS,TFS doesnot allow deletion of files or gc,so only the commits are rewritten.Now that I clone from TFS, it is still the older size but the commits are rewritten(so if I do filter branch then those files doesnt exist).I even tried gc everything – keerthee Aug 06 '15 at 14:11

3 Answers3

3

I used BFG cleaner to clean the unwanted big files and then did

git reflog expire --expire=now --all
git gc --aggressive --prune=now
qwr
  • 9,525
  • 5
  • 58
  • 102
keerthee
  • 812
  • 4
  • 17
  • 39
  • 3
    The OP asked how to delete a blob by ID. If not answering the question directly please consider explaining how to use the BFG `--strip-blobs-with-ids` CLI flag. – vhs Jun 03 '17 at 10:05
  • 1
    Kudos for referring to BFG, but more explanation is needed. For OSX: 1. `brew install bfg` 2. `bfg --strip-blobs-with-ids ` 3. `git reflog expire --expire=now --all && git gc --prune=now --aggressive` – Julian K Jul 09 '18 at 20:59
  • 1
    Why two calls to git gc? – toolforger Mar 16 '20 at 16:39
  • https://github.com/rtyley/bfg-repo-cleaner – qwr Sep 20 '21 at 18:10
0

The "proper" way to do this is with git's garbage collector.

First find all trees that reference the blob. Then find all commits that reference one of those trees.

Delete those commits entirely (from all heads' history, all tags, and the reflog), and the garbage collector will clean up the blob.

Deleting the blob without first removing the objects that reference it will corrupt your repository.

One easy way to automate this whole process is to use git filter-branch, which provides you the ability to produce an alternate history in which that particular file was never checked in.

Borealid
  • 95,191
  • 9
  • 106
  • 122
  • I have done the git filter-branch,now the commits are rewritten,but still the blobs exist in the git repo – keerthee Aug 06 '15 at 12:26
  • @keerthee Look at the man page for `filter-branch` - see the section labeled "CHECKLIST FOR SHRINKING A REPOSITORY". If you properly removed the references, cleared the reflog, and forced a gc, the garbage would be gone. – Borealid Aug 06 '15 at 12:27
  • Actually I did the above that reduced my repo size and pushed to repo in TFS,TFS doesnot allow deletion of files or gc,so only the commits are rewritten.Now that I clone from TFS, it is still the older size but the commits are rewritten(so if I do filter branch then those files doesnt exist).I even tried gc everything – keerthee Aug 06 '15 at 13:02
  • @keerthee Then your problem is with TFS and not with git. – Borealid Aug 06 '15 at 21:08
  • I understand that,but is there a way to clean the repo that is cloned local – keerthee Aug 07 '15 at 05:31
  • @keerthee A fresh clone will get exactly what is stored in TFS, no more and no less. – Borealid Aug 07 '15 at 05:43
  • You're mentioning the garbage collector, but don't provide any specifics. There're cases where `git gc` won't help (e.g. removing a remote). `--prune` might, but it took a while for me to figure that out. What I tried is `--aggressive`, but it didn't delete the objects. So I thought that `git gc` is not applicable in my case. It should be noted that `--prune` by default doesn't delete objects that are less than 2 weeks old. You can use `--prune=now`, but if another process is writing to the repository concurrently, you risk corrupting it (the repository). – x-yuri Jul 16 '22 at 16:16
0

If you already have the blob ID, you can find the filename ( or viceversa ) with git verify-pack

git verify-pack -v .git/objects/pack/*.idx | grep <reference_id or filename>

Once you have the filename, you should

  • remove ALL references to the blob from git, then
  • rewrite history with git filter-branch to remove the blob from every commit in the branch.

This way, git garbage collector git gc will clean it and free the space.

Have a look at the script git forget-blob to do all this in one step

git forget-blob file-to-forget

https://ownyourbits.com/2017/01/18/completely-remove-a-file-from-a-git-repository-with-git-forget-blob/

Basically this removes all tags, remote references, like so

git tag | xargs git tag -d
git filter-branch --index-filter "git rm --cached --ignore-unmatch $FILE"
rm -rf .git/refs/original/ .git/refs/remotes/ .git/*_HEAD .git/logs/
git for-each-ref --format="%(refname)" refs/original/ | \
  xargs -n1 --no-run-if-empty git update-ref -d
git reflog expire --expire-unreachable=now --all
git repack -A -d
git prune
qwr
  • 9,525
  • 5
  • 58
  • 102
nachoparker
  • 1,678
  • 18
  • 14