4

I need to permanently and completely remove a git repository's commit history and continue with the current version of the files. Older versions / commits must not be accessible in any way. I have tried creating a new master branch, removing all other branches, but old commits keep showing when I try 'git show xxxx' in git bash.

I have tried creating a new master branch, removing all other branches, and using git gc.

WantsToKnow
  • 41
  • 1
  • 3
  • Possible duplicate of [Make the current commit the only (initial) commit in a Git repository?](https://stackoverflow.com/questions/9683279/make-the-current-commit-the-only-initial-commit-in-a-git-repository) – phd Jul 03 '19 at 19:24
  • 2
    History is nothing but commits; commits are the history; so a new empty repo, with no commits yet, to which you make one commit consisting of all of the files, is the answer. – torek Jul 03 '19 at 20:15

3 Answers3

2

The short answer - create a fresh repo

For such a complete clean up, you need to create a brand new repo and delete the old one.

I could give you answers about running garbage collection, and clearing the reflog, and finding the other places where Git stores deleted commits for a while, but with the question as you ask it, I strongly recommend a fresh repo.

Especially, if you are using GitHub or some other online Git server, full cleaning of old commits may be a hopeless task.

Giving it a real try - sandbox side

OK, so my answer, "give up, it cannot be done" is not very satisfactory. Here are some commands that might purge old commits from a sandbox:

Step 1: purge the reflog

The reflog keeps pointers to where HEAD pointed to for some time back. git gc will not delete any commit still pointed to by the reflog, because they're not actually loose objects if they're still pointed to by the reflog.

This worked for me:

git reflog expire --expire=all --all

Validation: run git reflog and make sure it's empty.

Step 2: delete or update any tags or branches that point into the old history

Any tags or branches still pointing into the old history will make sure that history cannot be deleted.

git tag -d <tagname>
git branch -D <oldbranchname>

Step 3: disconnect or clean up any remote references

If You still have origin/master pointing to a commit, it cannot be garbage collected. So either remove the old remote, or delete all the tags and references on the old remote too and prune their references in your sandbox:

git fetch --prune

or even

git remote remove origin

Validation: run git log --all and make sure the old commits are not listed.

Step 4: garbage collection

Now, you can run garbage collection, with options to make it as thorough as possible.

git gc --prune=now --aggressive

At this stage, finally, the old master commit is no longer shown by git show <old-sha1-of-master> in my test repo.

Thinking about giving it a try - server side

This is where things get harder, because you have to know what types of reflog-like things and backups your server uses. But...

  • Push the new history you wanted.
  • Make sure you delete all the old history: delete or update any tags or branches pointing to the old commits.
  • Pull Requests: as far as I know, GitHub permanently saves the HEAD of any pull request, even ones that were closed without merging, even if the branch got deleted. I don't know how to purge commits kept alive by old PRs.
  • Issues: I think issues can make references to commits by their sha1, and I bet those would block garbage collection too. So, thoroughly scan your old issues, and if you find a problematic one, I'm not sure what you should do.
  • Backups: here I don't know, but I'm sure there are backups, and they're not going to be your friend for this task. But maybe they have a limited retention and will be flushed after some time?
  • reflog and garbage collection: we're back at step 1, because I have no idea how to force GitHub to trigger reflog cleanup and garbage collection. If you have access to your own GitLab instance, your Git sysadmin might be able to do it.

Once you've checked all of the above, try a fresh git clone and git clone --mirror. Also try loading this in your browser: https://<server>/<user>/<repo>/commit/<sha1>. If none of these show the commits you wanted removed, then I guess you're done?

Realistically, I don't think the above test will say you're done. Server side, if you really want to remove the old history with any secrets it might contain, I'm back to my unsatisfying initial answer: delete the repo from your Git server completely (accept all the warnings that say "this is not reversible" - that's what you want, after all!), and create a new repo with an empty history, an empty list of PRs, empty backups, and push to it just the history you want.

Update: this answer to a related question, Remove sensitive files and their commits from Git history says you can contact GitHub customer support to get a dangling commit with sensitive information actually deleted from your repo.

Thinking about other traces

Once your sandbox and server are fixed, don't forget that:

  • any forks of your repo will still have references to the old commits
  • anyone else who cloned the repo (or a fork) on their machine will still have references to the old commits
joanis
  • 10,635
  • 14
  • 30
  • 40
  • I guess I, for one, am still interested in how to permanently remove the undesirable commit(s) in the current repository. I understand it's "hard", but I would still like to know how, especially for the more general case of just removing a subset of commits instead of the OP use case of *all* commits. In that more general case, reconstructing the commit history minus the undesirable commit(s) such that it is exactly the same as the old repo, including commit meta data (as much as possible - SHAs will change in some cases, of course), is also hard. – Juan Apr 25 '21 at 20:23
  • @Juan, yours is an interesting question, but a sufficiently different use case that I would ask it separately. Things like rebasing and filter-repo will help create the new repo, and, really, that question has been asked many times and answered thoroughly: search "removing sensitive files from a git repository" to find many results. Given all that's been written on this topic, I'd now qualify it as a hard problem that was solved and is now fairly easy. – joanis Apr 26 '21 at 13:15
  • @Juan you just convinced me I should try to give a real answer. Have a look and let me know if it's helpful. – joanis Apr 26 '21 at 14:02
0

As @torek pointed out:

History is nothing but commits; commits are the history; so a new empty repo, with no commits yet, to which you make one commit consisting of all of the files, is the answer.

As a general warning, this is a destructive action. Delete your history with:

rm -rf .git/

Great a new Git repo, add all of your files, and commit:

git init
git add .
git commit -m 'Initial commit'

And push or force-push to wherever you're saving the repo.

A caveat to the stated goal of "older commits must not be accessible in any way": Git is a distributed version control system. Any user who has a copy of the codebase with its current history will still have it after these changes are made.

Jake Worth
  • 5,490
  • 1
  • 25
  • 35
  • Warning: The resulting repository might not have the same content indexed if the original repository had files that were part of the repo and match .gitignore rules. – lol Jul 25 '21 at 00:43
-1

You can try this approach. This will permanently delete your commit history.

Create a new orphan branch

git checkout --orphan temp_branch

Add files to new branch

git add -A
git commit -m "Initial commit"

Delete original master branch

git branch -D master

Rename orphan branch

git branch -m master

Push changes (if necessary)

git push -f origin master
brandonwang
  • 1,603
  • 10
  • 17
  • This is what I have tried but 'git show xxxx' (where xxxx is an older commit) still comes up with older versions. – WantsToKnow Jul 03 '19 at 19:07
  • 1
    This is not as permanent as you think: git keeps deleted branches for a while, you'd need to clean the reflog and run garbage collection and flush backups, etc. – joanis Jul 03 '19 at 19:07
  • The old commit is still available if you try to check it out (by SHA, for example). This is true even afer `git gc --purge=now`. I have not found a way yet to truly remove a commit in the current repo. Even cloning preserves the old 'master'. – Juan Apr 25 '21 at 20:08
  • @Juan, I fully agree with you, hence my brutal scrap the repo, scrap the sandbox, and start fresh with a new repo/sandbox containing only the rewritten history you want to keep. – joanis Apr 26 '21 at 13:18
  • Ah, yes, the approach outlined per [Randall Munroe](https://xkcd.com/1597/). Unfortunately, with git, it is widely treated as an acceptable UX rather than a shortcoming of the tool's design ("What do you mean there should be a better way?"). – Juan May 08 '21 at 12:37