According to that GitHub page, any commit may be referenced via SHA1, even if no ref points to it, so you must delete the repository and recreate it. I can verify that a commit is still visible at least two weeks after it has been dereferenced. In general, once you have removed the sensitive data — so that they are not accessible via any ref — the simplest way to prune Git’s object store is to clone the repository and destroy the old one. This is especially true if you do not have direct access to the repository such as on GitHub.
(In other words: If the garbage SHA1 is known, then GitHub will happily serve it over the web. The Git protocol will normally refuse to give you unnamed commits, but it can be enabled with the daemon.uploadarch
config.)
The way to turn referenced objects into garbage objects is with judicial application of rebase
, filter-branch
, reflog
, update-ref
and the like. The way to purge garbage objects is with judicial application of gc
, fsck
, prune
, and repack
.
Example queries:
List dangling commits, which you may grep for sensitive data that may be garbage collected:
git fsck --no-reflogs | awk '/dangling commit/{print $3}' | while read sha1;
do git grep foo $sha1; done
List every single object reachable from a ref (add --walk-reflogs
for reflogs instead):
git rev-list --objects --all | while read sha path;
do git show $sha | grep baz; done
Another way is to use fast-export
to export the entire repository into a text-based file, which you can pick through and manipulate with any tool you want, then fast-import
into a fresh repo. This is good because it doesn’t carry any garbage, and you can grep the whole archive very easily.
The answer does not change if you do not have a work tree, but commands like filter-branch
may want a work tree for some use cases.