0

I want to remove one or more commits from a repository permanently, leaving the repository in a state as if the undesirable commits were never there. There is lots of references that point to git reset for this job. In testing, it seems that git reset leaves the old commits still available, even after garbage collection.

Here's an example that shows this (using git 2.25.1 tested with a posix shell script).

% cat dotest
tmpdir=$(mktemp -d $(pwd)/tmpdir.XXXXX)

cd $tmpdir

echo 'set up a repo with two commits...'
(set -x; git init)
echo 0 > x
(set -x
git add x
git commit -m 0 x)
sha0=$(git rev-parse HEAD)
echo 1 > x
(set -x; git commit -m 1 x)
sha1=$(git rev-parse HEAD)
(set -x; git log --graph --all --decorate --oneline)

echo; echo ===========================
echo 'remove second commit and switch working directory to first commit...'
(set -x
git reset --hard $sha0
git log --graph --all --decorate --oneline)

echo; echo ===========================
echo 'try to checkout the commit that was removed...'
(set -x
git checkout $sha1
git log --graph --all --decorate --oneline)

echo; echo ===========================
echo 'try to checkout the commit that was removed after forcing a garbage collect...'
(set -x
git reset --hard $sha0
git gc --prune=now
git checkout $sha1
git log --graph --all --decorate --oneline)

And the results:

% sh dotest
set up a repo with two commits...
+ git init
Initialized empty Git repository in /tmp/tmpdir.e4vrh/.git/
+ git add x
+ git commit -m 0 x
[master (root-commit) 1aba911] 0
 1 file changed, 1 insertion(+)
 create mode 100644 x
+ git commit -m 1 x
[master 6b9e4c1] 1
 1 file changed, 1 insertion(+), 1 deletion(-)
+ git log --graph --all --decorate --oneline
* 6b9e4c1 (HEAD -> master) 1
* 1aba911 0

===========================
remove second commit and switch working directory to first commit...
+ git reset --hard 1aba911898c5d9b18fbc314c9df59f485318ea23
HEAD is now at 1aba911 0
+ git log --graph --all --decorate --oneline
* 1aba911 (HEAD -> master) 0

===========================
try to checkout the commit that was removed...
+ git checkout 6b9e4c115a92b9baeec8c41e63402c239ae46cc0
Note: switching to '6b9e4c115a92b9baeec8c41e63402c239ae46cc0'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 6b9e4c1 1
+ git log --graph --all --decorate --oneline
* 6b9e4c1 (HEAD) 1
* 1aba911 (master) 0

===========================
try to checkout the commit that was removed after forcing a garbage collect...
+ git reset --hard 1aba911898c5d9b18fbc314c9df59f485318ea23
HEAD is now at 1aba911 0
+ git gc '--prune=now'
+ git checkout 6b9e4c115a92b9baeec8c41e63402c239ae46cc0
Previous HEAD position was 1aba911 0
HEAD is now at 6b9e4c1 1
+ git log --graph --all --decorate --oneline
* 6b9e4c1 (HEAD) 1
* 1aba911 (master) 0

This shows that the commit that I tried to delete using git reset is still available even after the garbage collection was run. And the "garbage" is preserved in a new clone made from this repo.

Note: This is similar to the Permanently remove git commit history question, but more general so I didn't hijack that question. But the basic issue is still the same. The 'Create a new orphan branch' answer to that question has the same problem as well - you can still get at the old "deleted" commits, even after GC.

Juan
  • 1,204
  • 1
  • 11
  • 25
  • 3
    The commit must not be reachable. If it can be seen by any branch, tag, etc, then it won't be deleted – evolutionxbox Apr 25 '21 at 20:56
  • 1
    Also check references on the commit in the reflogs before you gc. – Romain Valeri Apr 25 '21 at 21:01
  • 2
    You need very intensive garbage collection: `git reflog expire --expire-unreachable=now --all && git gc --aggressive --prune=now` – phd Apr 25 '21 at 21:40
  • @phd That's a winner. Don't need "--agressive" (and --aggressive by itself is not sufficient). Thanks. 'hg strip' (or 'hg histedit') beats this for sure. – Juan Apr 25 '21 at 22:11
  • @Juan: Mercurial dumps the stripped-out commits into `.hg/strip-*` for recovery, because its database requires removing the commits. Git is simply lazy, because its database encourages working that way. In fact, because you create unreferenced objects in a GIt repository while you build up a new commit, Git's database *must* allow these to sit around idly, and `git gc` gives processes a 14 day grace period, by default, to get their work done. – torek Apr 26 '21 at 10:26
  • Of course, in both Git and Hg, if you've allowed the commits to propagate to some *other* repository, you're going to have a lot more trouble. – torek Apr 26 '21 at 10:28
  • @torek. Continuing with the side comparison with hg strip... When you strip with hg, it explicitly tells you that it sticks he stripped commits into a bundle file and where it sticks that file. `rm` is easier to both understand and execute than expiring reflogs and initiating garbage collection in just the right way. And a hg clone won't copy stripped changesets. You get the old "stripped" commits (not even really sure the right term for this under git) in the new clone if you clone using git. So you are forced to do the convoluted dance, like lots of other operations, in git. – Juan May 08 '21 at 00:35
  • I'm not saying that Git's approach is any *better*, I'm just explaining the internal differences that result in the oddity in Git. Mercurial's user interface is clearly better for many users, but for whatever reasons, Git seems to have won the market: Mercurial still hasn't even been translated into Python3. – torek May 08 '21 at 01:04
  • @torek. Mercurial got beta python3 support in 2019 with Mercurial 5.0, officially in [5.2](https://www.mercurial-scm.org/wiki/Python3) in Nov 2019. I have used hg with python3 in production for about six months now (fall 2020) starting with 5.5. This is off topic for this Q of course (yes, I know, I touched on hg first in a comment above). I am just responding to avoid a little misinformation. – Juan May 08 '21 at 13:12
  • Aha. I had been trying to get updates from https://www.mercurial-scm.org/ (main page) for a while and it's still stuck on Python 2.7. – torek May 08 '21 at 13:16
  • It will work with either py2 or py3 at this time. The python3 wiki page (in my previous comment) today says py2 wiill be dropped in 2020. That didn't happen, so that's a little out of date. If I were the maintainers I would keep py2 supported through the 5.x series. I think all you need to do is run setup.py with the pythonX.Y you want to use (see 'supportedpy' in setup.py). If you use the 'Makefile' to build, it has always defaulted PYTHON to 'python' (so whatever your system's default python is or override with `make PYTHON=pythonX.Y`). In 5.7 it changed the default PYTHON to 'python3'. – Juan May 08 '21 at 13:56

0 Answers0