3

I use git for various projects (personal repositories only), and I want to do some housekeeping.

I have a downloaded git project tree that has a large history of commits. After downloading I made a few more myself. However, I do not need anything apart from the latest commit at the time I downloaded it, and the subsequent commits that I made. All the prior commits take up a lot of space, and I'd like to get rid of them.

What I should have done is delete the .git folder after download and create a new personal repository going forward - but I didn't.

So my question is this: can I clean up the repository so that everything prior to commit X is removed, as if it had never existed, but so that subsequent commits are maintained? If so, how? Also if possible, if there were multiple branches at that time, can I remove other branches also?

(Not sure if this is possible as I think one of git's claims is how hard it is to lose old data by mistake).

Antonio Petricca
  • 8,891
  • 5
  • 36
  • 74
nmw01223
  • 1,611
  • 3
  • 23
  • 40
  • Actually many ways to do this including `bfg` (https://rtyley.github.io/bfg-repo-cleaner/) and `git-filter-repo`, but the easiest might be to set HEAD to the commit you want, copy the files to a new directory, `git init` the new directory, set HEAD back to your latest version, and then copy your current lists of files to your new git directory. Of course, backup everything before doing any of this. – Barry Carter Aug 14 '22 at 12:49
  • you should ask one question at the time. but yes, it can be done. Is your very first commit an empty commit or do you have any content in it? (can help for the how to proceed) – OznOg Aug 14 '22 at 12:50
  • Does this answer your question? [How do I remove the old history from a git repository?](https://stackoverflow.com/questions/4515580/how-do-i-remove-the-old-history-from-a-git-repository) – mkrieger1 Aug 14 '22 at 14:09
  • Thanks, that led me to a simple answer. – nmw01223 Aug 14 '22 at 19:22

4 Answers4

1

I have a downloaded git project tree that has a large history of commits. After downloading I made a few more myself.

Since you've only a made "a few more" that you wish to keep, I'm going to assume your "new" history is linear. If that's the case, then this is extremely easy to do. For this example we'll assume the branch you want to keep is called main:

# make sure your status is clean
git status # verify it's "nothing to commit, working tree clean"

# Figure out your first commit ID
git log --reverse -n 1 # let's call the result <repo-root-commit-id>

# Figure out the commit you started from (parent of your first new commit)
git log # let's call the starting commit X, as stated in the question

# Make a new temp branch from the commit you started from (commit X)
git switch -c temp-branch X

# soft reset to the repo root commit
git reset --soft <repo-root-commit-id>

# Now the entire history from initial commit through X will be staged
# Make all of this a single commit
git commit -m "Squash repo history into a single commit"

# Now rebase all of your new commits onto the temp branch
git rebase X main --onto temp-branch

# Now your rewritten main branch is as desired, delete the temp branch
git branch -d temp-branch

Since your goal is to recover space used by the old history, you can remove your remote, delete all local branches except main, and either garbage collect now or re-clone your new repo to another place. For example, those links are summarized here:

# Remove the remote:
git remote remove origin

# Delete all local branches except main
git branch | grep -v main | xargs git branch -D

# Garbage Collect everything now
git reflog expire --expire=now --all
git gc --aggressive --prune=now
TTT
  • 22,611
  • 8
  • 63
  • 69
0

I suggest you to squash your local commits by:

git log --oneline

# Write down the hash commit prior to your first commit

git rebase -i <commit-hash>

# Now a text editor will open, so change **pick** into **squash** for the second commit and following, then save and exit editor...

Now, all your new commits will be merged into your latest one.

You are ready to push it.

Here a short tutorial.

Antonio Petricca
  • 8,891
  • 5
  • 36
  • 74
0

This is what I tested;

  • Make a backup of your repo first.
  • Find the oldest commit (e.g. with git log --reverse).
  • Run git rebase -i <oldest-commit>, and mark all commits except those you want to keep as drop.
  • Remove all remotes (e.g; git remote remove origin).
  • Run git reflog expire --all --expire=now.
  • Run git gc --aggressive.

If you run git fsck before and after these steps, you should see that the number of objects is significantly reduced.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
-2

Thanks for all the comments, particularly mkreiger1.

That led me to a post re git clone SRC DEST --depth=nn. That did it, saved about 90% of the space.

Since it is a local clone, necessary to prefix SRC with file:// or depth gets ignored.

Also noted it has a .github folder, as opposed to .git. Not sure why, but all relevant history seems present.

nmw01223
  • 1,611
  • 3
  • 23
  • 40
  • Note a shallow clone has a graft in it, so I suspect you're going to run into [this problem](https://stackoverflow.com/q/50992188/184546). Coincidentally, it looks like the solutions to that problem are fairly similar to [my answer](https://stackoverflow.com/a/73354467/184546) to your question. – TTT Aug 14 '22 at 21:26
  • Yes, thanks for your answer. It will never be pushed to a remote repository. – nmw01223 Aug 15 '22 at 13:46
  • BTW, I think the downvote here is because if that's what you want, then this question is a dup. (And it leaves you with a graft that you probably still need to rewrite.) Perhaps if you follow-up with rewriting after the shallow clone it would be a simple working solution. – TTT Aug 15 '22 at 14:03
  • You may be right about duplicate. Most things are answered somewhere if only one can find them, and I didn't in this case. However I must be missing the point somewhere. I thought grafts were basically about joining histories, I want the opposite. I want to lose, permanently and irretrievably, all the old history back past a certain commit, and the cloned repository will only ever be local - it's a one way trip. So, not clear what future problems can occur from 'git clone SRC DEST --depth=nn'? – nmw01223 Aug 16 '22 at 05:22