2

I work with a code base shared by a number of people and have a particular branch that is for my personal use only. I made a huge mistake and about 10 commits ago removed a particular file type from the git ignore and then pushed a number of files (which added up to a lot of memory) to the remote of my personal branch. Now our repo is taking up a lot of memory and I need to somehow remove these files from the past commits. I have already added the file type back to the git ignore and removed the files for future commits, now it is only the past commits which I need to remove the files.

I have done some research and know I have options such as git filter-branch and git filter-repo, while other solutions seem to only pertain to modifying one commit back. I prefer not to use git filter-branch as I have seen many warnings about it. I have tried git filter-repo but I think I am misunderstanding how to push my changes only to the branch I want to change (not having to merge them into or effect the master branch which I am not able to push to). If there is a simple solution which involves editing a single commit that would also work. Also note that is not one large file but many (10-20) files that I need to remove (existing in a few different paths).

I am hoping the solution will look something like this: Checkout the branch I want to change the history of > run some sort of git command to remove the files from that branch or to change particular commits (would be easy if I could just edit the git ignore of those commits) > push those changes to the remote version of that specific branch

The result being that I have successfully removed files from old commits on the remote version of this one branch thus reducing the overall size of our repository.

This was a huge screw up on my part, any and all advice is appreciated!

Leea Stott
  • 23
  • 3
  • 1
    If sounds like: 10 commits ago has the problem, and the current commit (or a recent one) has the fix for it? In that case you can do an interactive rebase on the last 10 (or more commits), and then squash the commit with the fix into the bad commit. So, e.g., with your branch checked out, `git rebase -i HEAD~10` (make sure you pick a high enough number that you see the bad commit in the list). Then find the fix commit near the bottom, move it up to after the bad commit, and change the fix commit from "pick" to "s", for squash. Save and exit and your branch will be re-written. Then force push. – TTT May 17 '22 at 21:47
  • 2
    Note after this is done, I believe it should reduce the size of the repo for new clones, but won't immediately clear it up for existing clones, unless garbage collection is forced. Depending on what tool you're using to host the repo, the size on that server may never shrink, since many cloud supported tools retain history forever. Depending on how big the overall repo is (are you running into size limits?), if necessary after the fix you could delete the remote repo and re-upload it without the big bad commits. – TTT May 17 '22 at 21:59
  • 1
    Just to clarify, the 10 or so commits are still *only* on your branch? (They have not been merged into `master` yet?) I ask because if the bad commit was already merged into `master`, my suggestion of rewriting just your own branch with interactive rebase won't work. – TTT May 17 '22 at 22:12
  • Thank you so much for the reply! Yes the problem started about 10 commits ago but the files exist in all the following commits up until the one that I fixed the problem. We are using bitbucket, Ill have to check if it would retain the memory regardeless. Yes, all only on my own branch, never merged into master! – Leea Stott May 19 '22 at 15:20
  • OK, since it's still only on your branch you should be able to fix this yourself, with the possible exception of the storage space utilization on Bitbucket. I've added an answer explaining some options. Side note- I think the term "storage space" probably works better in this context than "memory". – TTT May 19 '22 at 17:02
  • Does this answer your question? [Remove file from git repository history](https://stackoverflow.com/questions/59727771/remove-file-from-git-repository-history) – Orace May 19 '22 at 17:09
  • 1
    @Orace that would work for removing the files (and it's what I would recommend if the commits were already merged into `master`), but since it's only one unmerged branch that is getting re-written, interactive rebase has the added advantage that you can fixup the .gitignore issue too. – TTT May 19 '22 at 17:27

1 Answers1

1

Given that the problem is only in commits on your own branch and hasn't been merged anywhere else yet, I believe re-writing your own branch is the simplest course of action. Suppose your git log --oneline looks similar to this:

db759dd Refactor ...
ed7e3cc Update ...
fff1234 Fix .gitignore and delete unwanted files # The FIX
314fc46 Update ...
fea0230 Refactor ...
d9fdf5d Update ...
74249a2 Increase ...
c985a1c Fix ...
d94122f Add ...
bad9999 Update .gitignore and add big files # The ISSUE
3a284fb Increase ...
abc1234 Merge PR 1234: Increase number of threads # BRANCH START
...

In the above example commit ID abcd123 represents the commit you branched off when creating your branch. Commit bad9999 represents the problem commit ID, and fff1234 represents the commit ID with the fix. With your branch checked out, you want to do an interactive rebase by specifying the commit you branched off of (or any later commit that occurred before the problem commit), like this:

git rebase -i abc1234

Now you will be presented with a TODO list, which will display all (non-merge) commits after the one you specified, in reverse order, like this:

pick 3a284fb Increase ...
pick bad9999 Update .gitignore and add big files
pick d94122f Add ...
pick c985a1c Fix ...
pick 74249a2 Increase ...
pick d9fdf5d Update ...
pick fea0230 Refactor ...
pick 314fc46 Update ...
pick fff1234 Fix .gitignore and delete unwanted files
pick ed7e3cc Update ...
pick db759dd Refactor ...

Tip: while doing an interactive rebase, if you change your mind and decide to cancel it, you must first delete all the lines in the file (or at least all the ones with instructions such as "pick"), and then save, and exit. If you simply save and exit without first deleting the lines, you will still proceed with the rebase (which might end up having no effect if your branch was linear, but it's not worth hoping for this is you really intend to cancel it).

Now you have a decision to make. Can you simply delete the ISSUE commit, and if so would it make sense to also delete the FIX commit? If you can do this, change the word "pick" to "d" (for "drop", or you could simply delete the line completely!) on the ISSUE and perhaps also the FIX lines, like this:

pick 3a284fb Increase ...
d bad9999 Update .gitignore and add big files
pick d94122f Add ...
pick c985a1c Fix ...
pick 74249a2 Increase ...
pick d9fdf5d Update ...
pick fea0230 Refactor ...
pick 314fc46 Update ...
d fff1234 Fix .gitignore and delete unwanted files
pick ed7e3cc Update ...
pick db759dd Refactor ...

If the ISSUE commit also had other changes in it that you want to keep, then instead of completely dropping it, the change you (probably1) can make to fix the bad commit is to simply move up the FIX commit and squash it into the ISSUE commit, like this:

pick 3a284fb Increase ...
pick bad9999 Update .gitignore and add big files
s fff1234 Fix .gitignore and delete unwanted files
pick d94122f Add ...
pick c985a1c Fix ...
pick 74249a2 Increase ...
pick d9fdf5d Update ...
pick fea0230 Refactor ...
pick 314fc46 Update ...
pick ed7e3cc Update ...
pick db759dd Refactor ...

Note "s" is the same as "squash", and you always squash "up" to the previous picked commit, so this will squash those 2 commits together. If the FIX commit reverses some of the changes made by ISSUE commit those changes will cancel out and won't be present in the newly created commit2. Now save and exit, and your rebase will begin. As the rebase progresses, it will pause after the squash and prompt you to modify the commit message of the new squashed commit. After you write the new commit message and save and exit, the rebase will continue.

Once the rebase is finished, look over the commits to make sure you are happy with them, and then force push out your branch:

git push --force-with-lease

Now your big bad commits should be gone from the repo history, and all new clones will not contain them. They may still remain indefinitely on the server though, perhaps even referenceable via the Web UI if you know the commit ID, and if that is a problem for you, two typical options are to ask the repo host's support team if they can purge the orphaned commits on the server for you, or, you could delete and re-upload the repo which may have minimum impact since only about 10 commit IDs are being purged. I would confirm with support before deleting and re-pushing the repo that you won't have to redo all of your security settings.

Notes:

1 If the FIX commit contained more than just the fix in it, and if that other stuff needs commits that came between ISSUE and FIX, then you're going to have conflicts here. Conflicts are a fact of life and you'll just have to resolve them. (Fortunately they are oftentimes straight-forward to resolve, especially in your case where you wrote both sides of the conflicting code.)

2 If the FIX commit is the exact opposite of the ISSUE commit (for example if you created FIX by reverting ISSUE), then a new squashed commit won't even be created- it will just skip it and your new branch history will have 2 less commits as if neither existed. If this is applicable you could have also simply dropped the commits in the rebase TODO list.

TTT
  • 22,611
  • 8
  • 63
  • 69
  • I followed the procedure you outlined above: dropped the issue and fix file. I then Saved and closed and the terminal stated the rebase was complete. I then did the forced push but not when I do git log --oneline I still see those commits :s – Leea Stott May 19 '22 at 21:55
  • @LeeaStott As soon as the rebase is done, if you do `git log` you still see the 2 commits you dropped? If yes do they have the same commit ID (hash) as they did before the rebase? Seems like either the rebase didn't work or your git log command is looking at a different branch. By the way, you can run `git log` on your branch immediately after the rebase to see if it worked. (You don't have to push your branch to look at it. Pushing it just updates Bitbucket with a copy of the new commits on your branch.) Consider waiting to push until you're happy with your rebased branch. – TTT May 20 '22 at 14:33
  • The commits are still there with the same commit id. I think there is another command before the push that may need to be completed. I created a test repo and followed the procedure you outlined above. I saved the filed and got the following message "Successfully rebased and updated refs/heads/issue_branch.". I then did a git log and the same commits were there with the same commit id's. I then tried to do the rebase again (entering git rebase -i ######) and then received the following message: – Leea Stott May 30 '22 at 20:48
  • fatal: It seems that there is already a rebase-merge directory, and I wonder if you are in the middle of another rebase. If that is the case, please try git rebase (--continue | --abort | --skip) rm -fr ".git/rebase-merge" valuable there. – Leea Stott May 30 '22 at 20:48
  • Okay so it seems git rebase --quit allowed me to exit that rebase, but not sure why the rebase isn't working? When I tried the suggested --continue, etc I got the following msg: warning: could not read '.git/rebase-merge/head-name': No such file or directory – Leea Stott May 30 '22 at 20:59
  • 1
    It is working now!!!! the problem was that I needed to use the command git config --global core.editor "code --wait" – Leea Stott May 31 '22 at 17:27
  • @LeeaStott Awesome. Now it makes sense. If your editor (looks like VSCode) wasn't waiting for you to save and exit, the rebase command would have had no effect on a linear branch, so that explains why nothing was changing before. Glad it's working! – TTT May 31 '22 at 18:38
  • Thank you so much for your assistance! One more question: We are still getting warnings that our repo is too large even after I performed the rebase - Im not sure if it has to do with the specifications in this link https://support.atlassian.com/bitbucket-cloud/docs/maintain-a-git-repository/ which state I also need to change my reflog and do garbage collection. Im not sure that the reflog and gc would effect the remote branch and if this is neccessary. – Leea Stott May 31 '22 at 18:45
  • @LeeaStott I assume you force pushed your rebased branch when you finished? If yes, try a fresh clone of the repo to see how big it is. If the fresh clone is now small, but the repo in BitBucket is still too big, re-read the last paragraph of the answer before the Notes section. BitBucket support should be able to help with purging unreachable commits on the remote repo. [This article seems to confirm that.](https://support.atlassian.com/bitbucket-cloud/docs/reduce-repository-size/) – TTT May 31 '22 at 21:35