8

I forgot to add certain files to my .gitignore when I began my project and consequently committed files I shouldn't have. I ignored the files afterwards as described here. The damage has already happened though since the files still exist in the history of my repository and now my repository is 10gb in size!

I have not pushed the files for the aforementioned reason, so rewriting history should be okay. In short, what I need to do is to rewrite history so that afterwards none of the files in the current .gitignore exist in any commit in the repository.

Edit: There are lots of small files contributing to the large size, so the suggested duplicate about how to remove all files above a certain size threshold does not solve this problem.

Steve
  • 374
  • 1
  • 4
  • 13
  • 1
    You could look into using filter-branch to remove all traces of the files. – Tim Biegeleisen Aug 17 '17 at 10:20
  • Possible duplicate of [How to remove/delete a large file from commit history in Git repository?](https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository) – phd Aug 17 '17 at 14:24

4 Answers4

11

To reset the commit histories as original, you can use git reset --hard origin/branchname.

To ignore files and remove them from history, you can follow below two aspects:

1. Ignore files which already committed to git

  • Create a .gitignore file (if you don’t have) by touch .gitignore.
  • Add files and folders you want to ignore in .gitignore. The wildcard is allowed. Then commit the changes.
  • Ignore the files from committed history:

git rm filename -r --cached git commit

2. Remove file from commit history totally

git filter-branch --index-filter 'git rm --ignore-unmatch -r --cached filename' --prune-empty -f -- --all

To tidy your local repository (you can also skip this step):

rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc
git prune --expire now

To push the rewrite history to remote:

git push -f --all
VasiliNovikov
  • 9,681
  • 4
  • 44
  • 62
Marina Liu
  • 36,876
  • 5
  • 61
  • 74
  • 1
    This only works if you know the `filename`. Is there an option if you for example need to remove all `__pycache__` files? – Roelant Mar 17 '20 at 09:06
0

If you have a lot of commits after these files were pushed I don´t think you'll be able to find a very nice solution, however they can be removed.
Let's say currently your latest work is in dev branch.
You can go to the commit where these files were initially tracked, and create a new branch there for commodity, let's say fix_dev.
I would first of all remove the files mistakenly tracked in this commit, following the SO post you quoted in your question. After that I would amend that commit with the command:

git commit --amend -m "Original message with additions"

This will create a copy of the original commit but it will diverge from the original git history. Next I would do an interactive rebase of your commits of the dev branch onto your fix_dev branch leaving the original unamended commit out. This way you should end up with a version of your repository without the mistakenly added files of that particular commit in branch fix_dev. You can repeat the process if you have few more commits where you added unwanted files.
Once you're happy with the version of your new branch, you could move the dev branch there:

git checkout fix_dev
git branch -f dev
git checkout dev

Lastly you'll need to force the push of dev to your remote repository to rewrite the history:

git push -f remote dev
Juan
  • 1,754
  • 1
  • 14
  • 22
0

You can do this using interactive rebase:-

  1. Pick the newest commit you want to remove files for.
  2. Remove them as normal and .gitignore then commit.
  3. Do an interactive rebase to move the removal commit to immediately after the commit you're fixing.
  4. Do an interactive rebase to squash the removal and its preceding commit.
  5. Repeat as necessary for any remaining commits.

I'd have thought that you could do 3 and 4 in one interactive rebase but I found it resulted in conflicts.

Mumrah
  • 392
  • 2
  • 9
-2

If repository size is your concern, I would suggest below steps

  1. Remove files from .gitignore. Now you will have those files back.
  2. Make a delete commit of all those files
  3. Again update .gitignore so that those don't get checked in again by anyone
Pranalee
  • 3,389
  • 3
  • 22
  • 36
  • 3
    But in that case the repository size would be the same (bigger in fact, as it will track the deletion of those files). You would still have issues when trying to clone it, for instance. The git history has to be rewritten in you want the size to be reduced. – Juan Aug 17 '17 at 12:43