The short answer - create a fresh repo
For such a complete clean up, you need to create a brand new repo and delete the old one.
I could give you answers about running garbage collection, and clearing the reflog, and finding the other places where Git stores deleted commits for a while, but with the question as you ask it, I strongly recommend a fresh repo.
Especially, if you are using GitHub or some other online Git server, full cleaning of old commits may be a hopeless task.
Giving it a real try - sandbox side
OK, so my answer, "give up, it cannot be done" is not very satisfactory. Here are some commands that might purge old commits from a sandbox:
Step 1: purge the reflog
The reflog keeps pointers to where HEAD pointed to for some time back. git gc
will not delete any commit still pointed to by the reflog, because they're not actually loose objects if they're still pointed to by the reflog.
This worked for me:
git reflog expire --expire=all --all
Validation: run git reflog
and make sure it's empty.
Step 2: delete or update any tags or branches that point into the old history
Any tags or branches still pointing into the old history will make sure that history cannot be deleted.
git tag -d <tagname>
git branch -D <oldbranchname>
Step 3: disconnect or clean up any remote references
If You still have origin/master
pointing to a commit, it cannot be garbage collected. So either remove the old remote, or delete all the tags and references on the old remote too and prune their references in your sandbox:
git fetch --prune
or even
git remote remove origin
Validation: run git log --all
and make sure the old commits are not listed.
Step 4: garbage collection
Now, you can run garbage collection, with options to make it as thorough as possible.
git gc --prune=now --aggressive
At this stage, finally, the old master
commit is no longer shown by git show <old-sha1-of-master>
in my test repo.
Thinking about giving it a try - server side
This is where things get harder, because you have to know what types of reflog-like things and backups your server uses. But...
- Push the new history you wanted.
- Make sure you delete all the old history: delete or update any tags or branches pointing to the old commits.
- Pull Requests: as far as I know, GitHub permanently saves the HEAD of any pull request, even ones that were closed without merging, even if the branch got deleted. I don't know how to purge commits kept alive by old PRs.
- Issues: I think issues can make references to commits by their sha1, and I bet those would block garbage collection too. So, thoroughly scan your old issues, and if you find a problematic one, I'm not sure what you should do.
- Backups: here I don't know, but I'm sure there are backups, and they're not going to be your friend for this task. But maybe they have a limited retention and will be flushed after some time?
- reflog and garbage collection: we're back at step 1, because I have no idea how to force GitHub to trigger reflog cleanup and garbage collection. If you have access to your own GitLab instance, your Git sysadmin might be able to do it.
Once you've checked all of the above, try a fresh git clone
and git clone --mirror
. Also try loading this in your browser: https://<server>/<user>/<repo>/commit/<sha1>
. If none of these show the commits you wanted removed, then I guess you're done?
Realistically, I don't think the above test will say you're done.
Server side, if you really want to remove the old history with any secrets it might contain, I'm back to my unsatisfying initial answer: delete the repo from your Git server completely (accept all the warnings that say "this is not reversible" - that's what you want, after all!), and create a new repo with an empty history, an empty list of PRs, empty backups, and push to it just the history you want.
Update: this answer to a related question, Remove sensitive files and their commits from Git history says you can contact GitHub customer support to get a dangling commit with sensitive information actually deleted from your repo.
Thinking about other traces
Once your sandbox and server are fixed, don't forget that:
- any forks of your repo will still have references to the old commits
- anyone else who cloned the repo (or a fork) on their machine will still have references to the old commits