1

Example

So let's say I have a local git repo with 10 commits, having SHA digests 0-9 so my git log looks like this

9 (HEAD -> master)
8
7
6
5
4
3
2
1
0 <- initial commit

and I decide that commits 5-9 are garbage and I would like to permanently delete all record of them from the repositoy and the disk space they introduced. Baiscally, I want the state of my repo to be the same as it was when commit 4 was made, and have it be like 5-9 never even happened.

I know that git reset --hard 4 will make my repo appear to have been rewinded to commit 4, but from what I understand, that merely changes the commit master points to from 9 to 4 but does not actually delete anything. All the data is still there, and is recoverable if you know the SHA of commit 9.

I am also aware of git filter-branch but that only removes files from the history, not commits.

Ive tried doing:

git reset --hard 4
git gc --prune=now

but after doing this, the disk space usage of my .git directory is the same or bigger, and I can still recover the history with git checkout 9. Why does git gc --prune=now not prune commits 5-9? Do I need to expire my reflog?

More Generally:

If I have a complex repo with many branches, tags, commits, merges and divergant history, How can I permanently and automatically remove all commits, along with the changes they introduce, and the disk space they consume, that occured after a certain time. Effectively rewinding the entire repo to that time and permanently destroying all activity that occurred after that date.

Community
  • 1
  • 1
RBF06
  • 2,013
  • 2
  • 21
  • 20

2 Answers2

0

git reset does not delete content. its simply make change your HEAD to point to the new SHA-1 you asked for.

How can i delete content?

I am also aware of git filter-branch but that only removes files from the history, not commits.

Let me correct you.

Once you do a git filter branch its updating the content and creating a new commit.

So where is the old commit?

The old commit it still in your repository. Its becoming a dangling object, which means that there is some content which is not reachable from any branch.

First of all read this answer to understand what is HEAD.

Now you have to use git filter-branch of BFG and only than execute the git gc.

enter image description here


... I decide that commits 5-9 are garbage and I would like to permanently delete all record of them from the repository and the disk space they introduced

You have several option to achieve it. Here is a simple one:

# Get back to the desired commit
git checkout <commit> # in your case 4

# now delete the old branch with the 5-9 commits
git branch -D <branch name>

# now create a new branch from commit #4
git checkout -b <branch>

# now you have to clean the leftovers.
# first lets see them (not required just for us to prove that we delete them)
git fsck --full 

# now you will get a list of all the "removed" commit.
# lets clean the repo right now.
git gc --aggressive --prune=now

Why does git gc --prune=now not prune commits 5-9?

It does not remove the commits since reset only change the HEAD and not the content of the repository.

Community
  • 1
  • 1
CodeWizard
  • 128,036
  • 21
  • 144
  • 167
0

Let's take this in parts...

I know that git reset --hard 4 will make my repo appear to have been rewinded to commit 4, but from what I understand, that merely changes the commit master points to from 9 to 4 but does not actually delete anything. All the data is still there, and is recoverable if you know the SHA of commit 9.

This is correct. Moreover, there are two reflogs that may retain pointers to commits 5, 6, 7, 8, and 9: one for HEAD, which remembers when HEAD pointed to those commits (if HEAD did ever point to them), and one for master, which remembers when master pointed to those commits (if master did ever point to them—we know for sure it pointed to 9 since that's where it was before the reset, but we don't know if it pointed, individually, to each of the previous ones, or if you maybe somehow brought them in all at once somehow, e.g., from another branch).

There may or may not be additional branches and/or reflogs pointing to those commits.

I am also aware of git filter-branch but that only removes files from the history, not commits.

This is not correct, although as Wolfgang Pauli said about something else, "This isn't right. This isn't even wrong!" In particular, this phrasing implies that git filter-branch removes things. It doesn't: it adds new commits.

Git is fundamentally built around the idea of adding new stuff, and never, ever, deleting anything. This includes git commit --amend, git rebase, and git filter-branch: they add new commits. The only Git commands that really remove expired stuff are the gc-related ones (git prune, git reflog expire, git repack, git prune-packed, and so on, and of course git gc itself).

I've tried doing:

git reset --hard 4
git gc --prune=now

but after doing this, the disk space usage of my .git directory is the same or bigger, and I can still recover the history with git checkout 9. Why does git gc --prune=now not prune commits 5-9? Do I need to expire my reflog?

Yes.

To get old objects to go away, you must:

  • hunt down and destroy all references, including those in reflogs
  • prune loose objects regardless of their age (the --prune=now part above)
  • repack any packed versions of those objects.

git gc --prune=now handles the last two steps, but not the first one. Using git reflog --expire=now --expire-unreachable=now wipes out all the reflogs (which is overkill: --expire-unreachable is probably all you need). If you have other stray references (other branches, tags, a loose stash or two, maybe even things like ORIG_HEAD and CHERRY_PICK_HEAD), you will have to clean those up manually. Note also that git filter-branch leaves the original set of references in .git/refs/original/, and those hold on to all the original (pre-filtered-copy) objects.

torek
  • 448,244
  • 59
  • 642
  • 775