6

I am using bfg to clean my git repo. To get the list of big files to delete, I use this script. However for some files I only want to delete specific versions of them from the repo.

bfg has the option to "strip blobs with the specified Git object ids". When I run the above script, I am given a hash for each object in the list. Given that hash, how can I find out the git object id of that specific object so that I can delete it with bfg?

Chin
  • 19,717
  • 37
  • 107
  • 164

1 Answers1

4

That script appears to list the git object id already.

If you have a particular commit you are interested to clean, you can use the command line "Which commit has this blob?" to check if a particular object id is part of said commit.

git log --all --pretty=format:%H -- <path> | \
 xargs -n1 -I% sh -c "git ls-tree % <path> | \
 grep -q <hash> && echo %"

For instance, in my repo seec:

a255b5c1d469591037e4eacd0d7f4599febf2574 12884 seec.go
a7320d8c0c3c38d1a40c63a873765e31504947ff 12928 seec.go

I want to clean the a7320d8 version of seec.go;

As seen in BFG commit 12d1b00:

People can get a list of blob-ids using "git rev-list --all --objects", then grep to list all files in directories they want to nuke, and pass that to the BFG.

Note: the bi test reads:

val blobIdsFile = Path.createTempFile()
blobIdsFile.writeStrings(badBlobs.map(_.name()),"\n")
run(s"--strip-blobs-with-ids ${blobIdsFile.path}")

Meaning the parameter to -bi is a file, with the blob id(s) in it.


I can also check what I just got is indeed the blob id by looking for its commit:

vonc@bvonc MINGW64 ~/data/git/seec (master)
$ git log --all --pretty=format:%H -- seec.go | xargs -n1 -I% sh -c "git ls-tree % seec.go|\
grep -q a7320d8 && echo %"

I get: commit c084402.

Let's see if that commit does actually include the seec.go revision blob id a7320d8 (using "Git - finding the SHA1 of an individual file in the index").
I can find the blob id of a file from a GitHub commit:

vonc@bvonc MINGW64 ~/data/git/seec (master)
$ (echo -ne "blob $(curl -s https://raw.githubusercontent.com/VonC/seec/c084402/seec.go --stderr -|wc -c)\0"; \
   curl -s https://raw.githubusercontent.com/VonC/seec/c084402/seec.go --stderr -) | \
  sha1sum | awk '{ print $1 }'
a7320d8c0c3c38d1a40c63a873765e31504947ff

Bingo.

Should I want to strip out seec.go blob id a7320d8, I know I can pass to bfg that blob id (in a "blob ids" file).

Ken Y-N
  • 14,644
  • 21
  • 71
  • 114
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • The hash that the script I linked seems to be the SHA-1 of the file content, not the hash id, so when I use that with bfg's `--strip-blobs-with-ids` option it cannot find the file. – Chin Jul 02 '17 at 04:17
  • The script you link is about the blob id, since I managed to get the blob id from the commit which had the id returned by the script. – VonC Jul 02 '17 at 04:25
  • That's strange. You're right that the hash I got in the script is indeed the object id itself (I did `git rev-list --all --objects | grep myFile` and saw the hash in the list). Yet, when I pass that hash to bfg using `java -jar bfg-1.12.15.jar -bi 12345` it says `Error: Option --strip-blobs-with-ids failed when given '12345'. 12345(The system cannot find the file specified).` (12345 is the example hash) – Chin Jul 02 '17 at 04:39
  • @Chin you can see that option used in the test: https://github.com/rtyley/bfg-repo-cleaner/blob/12d1b00bff6afdeb474a5194be4d0b19b2cc481b/src/test/scala/com/madgag/git/bfg/cli/MainSpec.scala#L64-L81. `--strip-blobs-with-ids ${blobIdsFile.path}` It must expects a file with the blob id in it. – VonC Jul 02 '17 at 04:50